hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
Morgan Funtowicz	6c5a75b593	misc(offline): update model creation as std::shared_ptr	2024-11-28 17:45:22 +01:00
Morgan Funtowicz	9d659f1e23	feat(backend): add missing temperature parameter	2024-11-28 16:55:17 +01:00
Morgan Funtowicz	df72c56b5b	feat(backend): add guard in case top_k = 0	2024-11-28 16:30:20 +01:00
Morgan Funtowicz	929a2fc718	feat(backend): add some test to the backend for core allocation	2024-11-28 14:53:46 +01:00
Morgan Funtowicz	298367cdfd	feat(backend): fix when num_cores_per_instance is equals to zero with the size of the generated core allocation	2024-11-28 14:53:35 +01:00
Morgan Funtowicz	8e89793514	feat(backend): use the new batch api from llama	2024-11-28 14:52:48 +01:00
Morgan Funtowicz	274cfce435	feat(backend): remove core overriding in the Rust backend	2024-11-28 11:40:52 +01:00
Funtowicz Morgan	d918e6a159	Update Dockerfile.llamacpp as per review Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>	2024-11-28 09:53:59 +01:00
Funtowicz Morgan	bbe95ca9e9	Update Dockerfile.llamacpp as per review Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>	2024-11-28 09:53:15 +01:00
Morgan Funtowicz	9025a26cea	chore: remove unrelated change to trtllm	2024-11-22 15:42:09 +01:00
Morgan Funtowicz	862a519fdd	misc(doc): rust documentation	2024-11-22 15:35:55 +01:00
Morgan Funtowicz	b9c04b9c07	misc(doc): c++ documentation	2024-11-22 15:13:54 +01:00
Morgan Funtowicz	4ee2ee58c9	misc(license): update LICENSE	2024-11-22 14:48:39 +01:00
Morgan Funtowicz	2d9465d181	misc(backend): allow rebinding numa core affinity	2024-11-22 14:02:58 +01:00
Morgan Funtowicz	30ae99631c	misc(docker): add numa lib as dependency	2024-11-22 13:34:52 +01:00
Morgan Funtowicz	5a85661661	feat(backend): rely on multi consumer queue to scheduler workers	2024-11-22 13:32:56 +01:00
Morgan Funtowicz	84eead219a	feat(backend): correctly setup llama_context providing n_threads and n_ubatch	2024-11-21 21:43:50 +01:00
Morgan Funtowicz	50c376612c	feat(backend): bind thread and memory affinity for thread	2024-11-21 13:52:38 +01:00
Morgan Funtowicz	5335bf973b	feat(backend): multistream inference on CPU	2024-11-21 00:03:05 +01:00
Morgan Funtowicz	23d2bcf28d	misc(build): improve build process	2024-11-14 09:38:13 +01:00
Morgan Funtowicz	70c90ad933	feat(backend): update llamacpp to 4077	2024-11-14 09:04:06 +01:00
Morgan Funtowicz	6f059c4b5d	feat(backend): wrap Arc tokenizer to avoid duplicating	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	57b215467b	feat(backend): simplify Rust callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	daf1631e09	dockerfile(backend): initial working version of llama.cpp container	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	02cd6fe427	chore(backend): minor improvements	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	363d5e45de	feat(backend): use std::ranges to map uint32_t to llama_token	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	488ba93898	feat(backend): fix invalid reference to context in release mode	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7e2890fe2c	feat(backend): remove unused function	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	6915fa3441	feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t)	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	86d30aea43	feat(backend): simplify overall cpp structure	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	4f5397c414	misc(cmake): use URL base llama.cpp repo	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	cf17928f83	misc(cmake): remove dependency on fmt	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	26d0266cec	feat(backend): handle all the tokenization failure and send back to the client	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	20652824d9	feat(dockerfile): build process	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	a7afde41a9	feat(backend): dockerfile	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7eec0f704f	chore(backend): minor fixes mostly format	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	a1154b17ec	feat(backend): avoid copy constructor	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	588421833c	misc(backend): missing header <variant>	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	62dba1a878	misc(cmake): use url deps and not git repo	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	52208f5b78	misc(backend): decrease log verbosity in callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	1149186794	feat(backend): expose tokenizer to the GenerationContext to decode token	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	1473259f84	feat(backend): add early stopping criteria from TGI stream callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	958c72a44a	misc(ffi): remove unused ffi mapping	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	5b7a951389	feat(backend): refactor the callback to handle intermediate and end inference message	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	11c593dc69	feat(backend): make eog clearer on c++ side	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	06424aa9ff	feat(backend): correctly handle the max_new_tokens case for is_eos	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	05ff551950	feat(backend): add number of generated tokens in the callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	188442f67d	misc(lint): make clippy happier	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	31d9254776	feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7b0a56f40f	feat(backend): fix memory leaking on llama_sampler when the decode ends	2024-11-14 08:42:01 +01:00

1 2 3 4 5 ...

1203 Commits All Branches Search

1203 Commits

All Branches