Commit Graph

1187 Commits

Author SHA1 Message Date
Morgan Funtowicz 84eead219a feat(backend): correctly setup llama_context providing n_threads and n_ubatch 2024-11-21 21:43:50 +01:00
Morgan Funtowicz 50c376612c feat(backend): bind thread and memory affinity for thread 2024-11-21 13:52:38 +01:00
Morgan Funtowicz 5335bf973b feat(backend): multistream inference on CPU 2024-11-21 00:03:05 +01:00
Morgan Funtowicz 23d2bcf28d misc(build): improve build process 2024-11-14 09:38:13 +01:00
Morgan Funtowicz 70c90ad933 feat(backend): update llamacpp to 4077 2024-11-14 09:04:06 +01:00
Morgan Funtowicz 6f059c4b5d feat(backend): wrap Arc tokenizer to avoid duplicating 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 57b215467b feat(backend): simplify Rust callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz daf1631e09 dockerfile(backend): initial working version of llama.cpp container 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 02cd6fe427 chore(backend): minor improvements 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 363d5e45de feat(backend): use std::ranges to map uint32_t to llama_token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 488ba93898 feat(backend): fix invalid reference to context in release mode 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 7e2890fe2c feat(backend): remove unused function 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 6915fa3441 feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t) 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 86d30aea43 feat(backend): simplify overall cpp structure 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 4f5397c414 misc(cmake): use URL base llama.cpp repo 2024-11-14 08:42:01 +01:00
Morgan Funtowicz cf17928f83 misc(cmake): remove dependency on fmt 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 26d0266cec feat(backend): handle all the tokenization failure and send back to the client 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 20652824d9 feat(dockerfile): build process 2024-11-14 08:42:01 +01:00
Morgan Funtowicz a7afde41a9 feat(backend): dockerfile 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 7eec0f704f chore(backend): minor fixes mostly format 2024-11-14 08:42:01 +01:00
Morgan Funtowicz a1154b17ec feat(backend): avoid copy constructor 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 588421833c misc(backend): missing header <variant> 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 62dba1a878 misc(cmake): use url deps and not git repo 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 52208f5b78 misc(backend): decrease log verbosity in callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 1149186794 feat(backend): expose tokenizer to the GenerationContext to decode token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 1473259f84 feat(backend): add early stopping criteria from TGI stream callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 958c72a44a misc(ffi): remove unused ffi mapping 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 5b7a951389 feat(backend): refactor the callback to handle intermediate and end inference message 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 11c593dc69 feat(backend): make eog clearer on c++ side 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 06424aa9ff feat(backend): correctly handle the max_new_tokens case for is_eos 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 05ff551950 feat(backend): add number of generated tokens in the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 188442f67d misc(lint): make clippy happier 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 31d9254776 feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 7b0a56f40f feat(backend): fix memory leaking on llama_sampler when the decode ends 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 86a2ae6ba2 chore: unsued variables 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 2cdfed94d9 feat(backend): correctly link to shared fmt and spdlog instead of static 2024-11-14 08:42:01 +01:00
Morgan Funtowicz bd8f0f15e1 feat(backend): fix invalid reference to ctx instead of context in release build 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 3e82f14f57 feat(backend): somewhat generates the final infer response 2024-11-14 08:42:01 +01:00
Morgan Funtowicz b50dcddbb8 feat(backend): avoid dropping the boxed stream at the end of the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 612f2f939f feat(backend): bind incoming request to the server 2024-11-14 08:42:01 +01:00
Morgan Funtowicz d4aee42fd8 feat(backend): add logit parameter in the callback fn 2024-11-14 08:42:01 +01:00
Morgan Funtowicz f39edc72ff feat(backend): add mapping for ignore_eos_token stopping criteria 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 3af2c6837c misc(offline): match rework 2024-11-14 08:42:01 +01:00
Morgan Funtowicz d52b4c4978 feat(backend): full rework of the backend internal to safer c++ 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 6a5f6b0755 misc(offline): update offline tester 2024-11-14 08:42:01 +01:00
Morgan Funtowicz b98c635781 feat(backend): entirely rewrite backend 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 611590440d misc(offline): expose more parameters for generate 2024-11-14 08:42:01 +01:00
Morgan Funtowicz dbc5b7a0f7 misc(offline): link correctly 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 0c1dd0ed2b feat(llamacpp): wip explosion 2024-11-14 08:42:01 +01:00
Morgan Funtowicz a316c53255 feat(llamacpp): expose number of threads for the backend when constructing the model 2024-11-14 08:42:01 +01:00