Morgan Funtowicz
|
9025a26cea
|
chore: remove unrelated change to trtllm
|
2024-11-22 15:42:09 +01:00 |
Morgan Funtowicz
|
862a519fdd
|
misc(doc): rust documentation
|
2024-11-22 15:35:55 +01:00 |
Morgan Funtowicz
|
b9c04b9c07
|
misc(doc): c++ documentation
|
2024-11-22 15:13:54 +01:00 |
Morgan Funtowicz
|
2d9465d181
|
misc(backend): allow rebinding numa core affinity
|
2024-11-22 14:02:58 +01:00 |
Morgan Funtowicz
|
5a85661661
|
feat(backend): rely on multi consumer queue to scheduler workers
|
2024-11-22 13:32:56 +01:00 |
Morgan Funtowicz
|
84eead219a
|
feat(backend): correctly setup llama_context providing n_threads and n_ubatch
|
2024-11-21 21:43:50 +01:00 |
Morgan Funtowicz
|
50c376612c
|
feat(backend): bind thread and memory affinity for thread
|
2024-11-21 13:52:38 +01:00 |
Morgan Funtowicz
|
5335bf973b
|
feat(backend): multistream inference on CPU
|
2024-11-21 00:03:05 +01:00 |
Morgan Funtowicz
|
23d2bcf28d
|
misc(build): improve build process
|
2024-11-14 09:38:13 +01:00 |
Morgan Funtowicz
|
70c90ad933
|
feat(backend): update llamacpp to 4077
|
2024-11-14 09:04:06 +01:00 |
Morgan Funtowicz
|
6f059c4b5d
|
feat(backend): wrap Arc tokenizer to avoid duplicating
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
57b215467b
|
feat(backend): simplify Rust callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
02cd6fe427
|
chore(backend): minor improvements
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
363d5e45de
|
feat(backend): use std::ranges to map uint32_t to llama_token
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
488ba93898
|
feat(backend): fix invalid reference to context in release mode
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
7e2890fe2c
|
feat(backend): remove unused function
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
6915fa3441
|
feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t)
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
86d30aea43
|
feat(backend): simplify overall cpp structure
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
4f5397c414
|
misc(cmake): use URL base llama.cpp repo
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
cf17928f83
|
misc(cmake): remove dependency on fmt
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
26d0266cec
|
feat(backend): handle all the tokenization failure and send back to the client
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
7eec0f704f
|
chore(backend): minor fixes mostly format
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
a1154b17ec
|
feat(backend): avoid copy constructor
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
588421833c
|
misc(backend): missing header <variant>
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
62dba1a878
|
misc(cmake): use url deps and not git repo
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
52208f5b78
|
misc(backend): decrease log verbosity in callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
1149186794
|
feat(backend): expose tokenizer to the GenerationContext to decode token
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
1473259f84
|
feat(backend): add early stopping criteria from TGI stream callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
958c72a44a
|
misc(ffi): remove unused ffi mapping
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
5b7a951389
|
feat(backend): refactor the callback to handle intermediate and end inference message
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
11c593dc69
|
feat(backend): make eog clearer on c++ side
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
06424aa9ff
|
feat(backend): correctly handle the max_new_tokens case for is_eos
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
05ff551950
|
feat(backend): add number of generated tokens in the callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
188442f67d
|
misc(lint): make clippy happier
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
31d9254776
|
feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
7b0a56f40f
|
feat(backend): fix memory leaking on llama_sampler when the decode ends
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
86a2ae6ba2
|
chore: unsued variables
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
2cdfed94d9
|
feat(backend): correctly link to shared fmt and spdlog instead of static
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
bd8f0f15e1
|
feat(backend): fix invalid reference to ctx instead of context in release build
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
3e82f14f57
|
feat(backend): somewhat generates the final infer response
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
b50dcddbb8
|
feat(backend): avoid dropping the boxed stream at the end of the callback
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
612f2f939f
|
feat(backend): bind incoming request to the server
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
d4aee42fd8
|
feat(backend): add logit parameter in the callback fn
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
f39edc72ff
|
feat(backend): add mapping for ignore_eos_token stopping criteria
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
3af2c6837c
|
misc(offline): match rework
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
d52b4c4978
|
feat(backend): full rework of the backend internal to safer c++
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
6a5f6b0755
|
misc(offline): update offline tester
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
b98c635781
|
feat(backend): entirely rewrite backend
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
611590440d
|
misc(offline): expose more parameters for generate
|
2024-11-14 08:42:01 +01:00 |
Morgan Funtowicz
|
dbc5b7a0f7
|
misc(offline): link correctly
|
2024-11-14 08:42:01 +01:00 |