hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
Morgan Funtowicz	84eead219a	feat(backend): correctly setup llama_context providing n_threads and n_ubatch	2024-11-21 21:43:50 +01:00
Morgan Funtowicz	5335bf973b	feat(backend): multistream inference on CPU	2024-11-21 00:03:05 +01:00
Morgan Funtowicz	488ba93898	feat(backend): fix invalid reference to context in release mode	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7e2890fe2c	feat(backend): remove unused function	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	86d30aea43	feat(backend): simplify overall cpp structure	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	cf17928f83	misc(cmake): remove dependency on fmt	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7eec0f704f	chore(backend): minor fixes mostly format	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	1473259f84	feat(backend): add early stopping criteria from TGI stream callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	5b7a951389	feat(backend): refactor the callback to handle intermediate and end inference message	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	11c593dc69	feat(backend): make eog clearer on c++ side	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	06424aa9ff	feat(backend): correctly handle the max_new_tokens case for is_eos	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	05ff551950	feat(backend): add number of generated tokens in the callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7b0a56f40f	feat(backend): fix memory leaking on llama_sampler when the decode ends	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	bd8f0f15e1	feat(backend): fix invalid reference to ctx instead of context in release build	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	d4aee42fd8	feat(backend): add logit parameter in the callback fn	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	f39edc72ff	feat(backend): add mapping for ignore_eos_token stopping criteria	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	d52b4c4978	feat(backend): full rework of the backend internal to safer c++	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	b98c635781	feat(backend): entirely rewrite backend	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	0c1dd0ed2b	feat(llamacpp): wip explosion	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	a316c53255	feat(llamacpp): expose number of threads for the backend when constructing the model	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	e4d803c94e	feat(backend): build and link through build.rs	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	f9c248657d	chore(backend): minor formatting	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	37faeb34b2	feat(backend): expose frequency and repetition penalties	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	d4b5be10f9	feat(backend): minor refactor	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	45d5a6a8c5	feat(backend): add some initial decoding steps	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	0911076320	feat(backend): correctly load llama.cpp model from llama api and not gpt2	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	05ad684676	feat(llamacpp): enable cuda	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	52d57dca79	feat(llamacpp): initial end2end build	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	aa1fcba59f	feat(llamacpp): initial commit # Conflicts: # Cargo.lock	2024-11-14 08:42:01 +01:00

29 Commits