Morgan Funtowicz
|
6c5a75b593
|
misc(offline): update model creation as std::shared_ptr
|
2024-11-28 17:45:22 +01:00 |
Morgan Funtowicz
|
9d659f1e23
|
feat(backend): add missing temperature parameter
|
2024-11-28 16:55:17 +01:00 |
Morgan Funtowicz
|
df72c56b5b
|
feat(backend): add guard in case top_k = 0
|
2024-11-28 16:30:20 +01:00 |
Morgan Funtowicz
|
929a2fc718
|
feat(backend): add some test to the backend for core allocation
|
2024-11-28 14:53:46 +01:00 |
Morgan Funtowicz
|
298367cdfd
|
feat(backend): fix when num_cores_per_instance is equals to zero with the size of the generated core allocation
|
2024-11-28 14:53:35 +01:00 |
Morgan Funtowicz
|
8e89793514
|
feat(backend): use the new batch api from llama
|
2024-11-28 14:52:48 +01:00 |
Morgan Funtowicz
|
274cfce435
|
feat(backend): remove core overriding in the Rust backend
|
2024-11-28 11:40:52 +01:00 |
Funtowicz Morgan
|
d918e6a159
|
Update Dockerfile.llamacpp as per review
Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
|
2024-11-28 09:53:59 +01:00 |
Funtowicz Morgan
|
bbe95ca9e9
|
Update Dockerfile.llamacpp as per review
Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
|
2024-11-28 09:53:15 +01:00 |
Morgan Funtowicz
|
9025a26cea
|
chore: remove unrelated change to trtllm
|
2024-11-22 15:42:09 +01:00 |
Morgan Funtowicz
|
862a519fdd
|
misc(doc): rust documentation
|
2024-11-22 15:35:55 +01:00 |
Morgan Funtowicz
|
b9c04b9c07
|
misc(doc): c++ documentation
|
2024-11-22 15:13:54 +01:00 |
Morgan Funtowicz
|
4ee2ee58c9
|
misc(license): update LICENSE
|
2024-11-22 14:48:39 +01:00 |
Morgan Funtowicz
|
2d9465d181
|
misc(backend): allow rebinding numa core affinity
|
2024-11-22 14:02:58 +01:00 |
Morgan Funtowicz
|
30ae99631c
|
misc(docker): add numa lib as dependency
|
2024-11-22 13:34:52 +01:00 |
Morgan Funtowicz
|
5a85661661
|
feat(backend): rely on multi consumer queue to scheduler workers
|
2024-11-22 13:32:56 +01:00 |
Morgan Funtowicz
|
84eead219a
|
feat(backend): correctly setup llama_context providing n_threads and n_ubatch
|
2024-11-21 21:43:50 +01:00 |
Morgan Funtowicz
|
50c376612c
|
feat(backend): bind thread and memory affinity for thread
|
2024-11-21 13:52:38 +01:00 |
Morgan Funtowicz
|
5335bf973b
|
feat(backend): multistream inference on CPU
|
2024-11-21 00:03:05 +01:00 |
Morgan Funtowicz
|
23d2bcf28d
|
misc(build): improve build process
|
2024-11-14 09:38:13 +01:00 |
Morgan Funtowicz
|
70c90ad933
|
feat(backend): update llamacpp to 4077
|
2024-11-14 09:04:06 +01:00 |
Morgan Funtowicz
|
6f059c4b5d
|
feat(backend): wrap Arc tokenizer to avoid duplicating
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
57b215467b
|
feat(backend): simplify Rust callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
daf1631e09
|
dockerfile(backend): initial working version of llama.cpp container
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
02cd6fe427
|
chore(backend): minor improvements
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
363d5e45de
|
feat(backend): use std::ranges to map uint32_t to llama_token
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
488ba93898
|
feat(backend): fix invalid reference to context in release mode
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
7e2890fe2c
|
feat(backend): remove unused function
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
6915fa3441
|
feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t)
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
86d30aea43
|
feat(backend): simplify overall cpp structure
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
4f5397c414
|
misc(cmake): use URL base llama.cpp repo
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
cf17928f83
|
misc(cmake): remove dependency on fmt
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
26d0266cec
|
feat(backend): handle all the tokenization failure and send back to the client
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
20652824d9
|
feat(dockerfile): build process
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
a7afde41a9
|
feat(backend): dockerfile
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
7eec0f704f
|
chore(backend): minor fixes mostly format
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
a1154b17ec
|
feat(backend): avoid copy constructor
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
588421833c
|
misc(backend): missing header <variant>
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
62dba1a878
|
misc(cmake): use url deps and not git repo
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
52208f5b78
|
misc(backend): decrease log verbosity in callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
1149186794
|
feat(backend): expose tokenizer to the GenerationContext to decode token
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
1473259f84
|
feat(backend): add early stopping criteria from TGI stream callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
958c72a44a
|
misc(ffi): remove unused ffi mapping
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
5b7a951389
|
feat(backend): refactor the callback to handle intermediate and end inference message
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
11c593dc69
|
feat(backend): make eog clearer on c++ side
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
06424aa9ff
|
feat(backend): correctly handle the max_new_tokens case for is_eos
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
05ff551950
|
feat(backend): add number of generated tokens in the callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
188442f67d
|
misc(lint): make clippy happier
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
31d9254776
|
feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
7b0a56f40f
|
feat(backend): fix memory leaking on llama_sampler when the decode ends
|
2024-11-14 08:42:01 +01:00 |