Commit Graph

1203 Commits

Author SHA1 Message Date
Morgan Funtowicz 6c5a75b593 misc(offline): update model creation as std::shared_ptr 2024-11-28 17:45:22 +01:00
Morgan Funtowicz 9d659f1e23 feat(backend): add missing temperature parameter 2024-11-28 16:55:17 +01:00
Morgan Funtowicz df72c56b5b feat(backend): add guard in case top_k = 0 2024-11-28 16:30:20 +01:00
Morgan Funtowicz 929a2fc718 feat(backend): add some test to the backend for core allocation 2024-11-28 14:53:46 +01:00
Morgan Funtowicz 298367cdfd feat(backend): fix when num_cores_per_instance is equals to zero with the size of the generated core allocation 2024-11-28 14:53:35 +01:00
Morgan Funtowicz 8e89793514 feat(backend): use the new batch api from llama 2024-11-28 14:52:48 +01:00
Morgan Funtowicz 274cfce435 feat(backend): remove core overriding in the Rust backend 2024-11-28 11:40:52 +01:00
Funtowicz Morgan d918e6a159
Update Dockerfile.llamacpp as per review
Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
2024-11-28 09:53:59 +01:00
Funtowicz Morgan bbe95ca9e9
Update Dockerfile.llamacpp as per review
Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
2024-11-28 09:53:15 +01:00
Morgan Funtowicz 9025a26cea chore: remove unrelated change to trtllm 2024-11-22 15:42:09 +01:00
Morgan Funtowicz 862a519fdd misc(doc): rust documentation 2024-11-22 15:35:55 +01:00
Morgan Funtowicz b9c04b9c07 misc(doc): c++ documentation 2024-11-22 15:13:54 +01:00
Morgan Funtowicz 4ee2ee58c9 misc(license): update LICENSE 2024-11-22 14:48:39 +01:00
Morgan Funtowicz 2d9465d181 misc(backend): allow rebinding numa core affinity 2024-11-22 14:02:58 +01:00
Morgan Funtowicz 30ae99631c misc(docker): add numa lib as dependency 2024-11-22 13:34:52 +01:00
Morgan Funtowicz 5a85661661 feat(backend): rely on multi consumer queue to scheduler workers 2024-11-22 13:32:56 +01:00
Morgan Funtowicz 84eead219a feat(backend): correctly setup llama_context providing n_threads and n_ubatch 2024-11-21 21:43:50 +01:00
Morgan Funtowicz 50c376612c feat(backend): bind thread and memory affinity for thread 2024-11-21 13:52:38 +01:00
Morgan Funtowicz 5335bf973b feat(backend): multistream inference on CPU 2024-11-21 00:03:05 +01:00
Morgan Funtowicz 23d2bcf28d misc(build): improve build process 2024-11-14 09:38:13 +01:00
Morgan Funtowicz 70c90ad933 feat(backend): update llamacpp to 4077 2024-11-14 09:04:06 +01:00
Morgan Funtowicz 6f059c4b5d feat(backend): wrap Arc tokenizer to avoid duplicating 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 57b215467b feat(backend): simplify Rust callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz daf1631e09 dockerfile(backend): initial working version of llama.cpp container 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 02cd6fe427 chore(backend): minor improvements 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 363d5e45de feat(backend): use std::ranges to map uint32_t to llama_token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 488ba93898 feat(backend): fix invalid reference to context in release mode 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 7e2890fe2c feat(backend): remove unused function 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 6915fa3441 feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t) 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 86d30aea43 feat(backend): simplify overall cpp structure 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 4f5397c414 misc(cmake): use URL base llama.cpp repo 2024-11-14 08:42:01 +01:00
Morgan Funtowicz cf17928f83 misc(cmake): remove dependency on fmt 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 26d0266cec feat(backend): handle all the tokenization failure and send back to the client 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 20652824d9 feat(dockerfile): build process 2024-11-14 08:42:01 +01:00
Morgan Funtowicz a7afde41a9 feat(backend): dockerfile 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 7eec0f704f chore(backend): minor fixes mostly format 2024-11-14 08:42:01 +01:00
Morgan Funtowicz a1154b17ec feat(backend): avoid copy constructor 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 588421833c misc(backend): missing header <variant> 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 62dba1a878 misc(cmake): use url deps and not git repo 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 52208f5b78 misc(backend): decrease log verbosity in callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 1149186794 feat(backend): expose tokenizer to the GenerationContext to decode token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 1473259f84 feat(backend): add early stopping criteria from TGI stream callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 958c72a44a misc(ffi): remove unused ffi mapping 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 5b7a951389 feat(backend): refactor the callback to handle intermediate and end inference message 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 11c593dc69 feat(backend): make eog clearer on c++ side 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 06424aa9ff feat(backend): correctly handle the max_new_tokens case for is_eos 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 05ff551950 feat(backend): add number of generated tokens in the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 188442f67d misc(lint): make clippy happier 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 31d9254776 feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 7b0a56f40f feat(backend): fix memory leaking on llama_sampler when the decode ends 2024-11-14 08:42:01 +01:00