Morgan Funtowicz
62dba1a878
misc(cmake): use url deps and not git repo
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
52208f5b78
misc(backend): decrease log verbosity in callback
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1149186794
feat(backend): expose tokenizer to the GenerationContext to decode token
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1473259f84
feat(backend): add early stopping criteria from TGI stream callback
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
958c72a44a
misc(ffi): remove unused ffi mapping
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
5b7a951389
feat(backend): refactor the callback to handle intermediate and end inference message
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
11c593dc69
feat(backend): make eog clearer on c++ side
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
06424aa9ff
feat(backend): correctly handle the max_new_tokens case for is_eos
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
05ff551950
feat(backend): add number of generated tokens in the callback
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
188442f67d
misc(lint): make clippy happier
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
31d9254776
feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7b0a56f40f
feat(backend): fix memory leaking on llama_sampler when the decode ends
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
86a2ae6ba2
chore: unsued variables
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
2cdfed94d9
feat(backend): correctly link to shared fmt and spdlog instead of static
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
bd8f0f15e1
feat(backend): fix invalid reference to ctx instead of context in release build
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
3e82f14f57
feat(backend): somewhat generates the final infer response
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
b50dcddbb8
feat(backend): avoid dropping the boxed stream at the end of the callback
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
612f2f939f
feat(backend): bind incoming request to the server
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d4aee42fd8
feat(backend): add logit parameter in the callback fn
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f39edc72ff
feat(backend): add mapping for ignore_eos_token stopping criteria
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
3af2c6837c
misc(offline): match rework
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d52b4c4978
feat(backend): full rework of the backend internal to safer c++
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
6a5f6b0755
misc(offline): update offline tester
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
b98c635781
feat(backend): entirely rewrite backend
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
611590440d
misc(offline): expose more parameters for generate
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
dbc5b7a0f7
misc(offline): link correctly
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
0c1dd0ed2b
feat(llamacpp): wip explosion
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a316c53255
feat(llamacpp): expose number of threads for the backend when constructing the model
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
179309b364
misc(build): refactor build type detection in cmake
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f0859c247f
misc(build): handle different lib destination folder lib/lib64
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
e4d803c94e
feat(backend): build and link through build.rs
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
355d8a55b4
feat(backend): wip Rust binding
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f9c248657d
chore(backend): minor formatting
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
37faeb34b2
feat(backend): expose frequency and repetition penalties
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d4b5be10f9
feat(backend): minor refactor
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
92bb113653
feat(backend): use llama_token as TokenId type
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
45d5a6a8c5
feat(backend): add some initial decoding steps
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
098c66920d
feat(backend): tell cmake to build llama-common and link to it
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
0911076320
feat(backend): correctly load llama.cpp model from llama api and not gpt2
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
05ad684676
feat(llamacpp): enable cuda
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
fa89d1e613
misc(cmake): wut
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
e4432d36b1
misc(cmake): add parameter to build specific cuda arch
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
52d57dca79
feat(llamacpp): initial end2end build
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7d1f8a2bd6
feat(llamacpp): correctly handle CMAKE_BUILD_TYPE for spdlog macros
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
aa1fcba59f
feat(llamacpp): initial commit
...
# Conflicts:
# Cargo.lock
2024-11-14 08:42:01 +01:00
Daniël de Kok
a785000842
Add initial support for compressed-tensors checkpoints ( #2732 )
...
compressed-tensors is a safetensors extension for sparse, quantized
tensors. The format is more powerful than earlier AWQ/GPTQ/FP8
quantization, because
- Different quantizer configurations can be used for different targets.
- The format can specify input/output quantizers in addition to weight
quantizers.
- Configurable exclusions for quantization.
This change adds a dependency on the `compressed-tensors` package for
its configuration parsing and layer matching functionality.
The following types of quantization are supported in this PR:
- W8A16 and W4A16 INT using GPTQ-Marlin kernels.
- W8A8 and W8A16 FP using FP8-Marlin and cutlass kernels.
Support for other quantization types will be added in subsequent PRs.
2024-11-10 13:54:07 +01:00
Wang, Yi
97f7a22f0b
add trust_remote_code in tokenizer to fix baichuan issue ( #2725 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-11-07 14:43:38 +01:00
Wang, Yi
b1f9044d6c
fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Inst… ( #2717 )
...
fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Instruct-AWQ
ipex kernel provide func like add_bias, so no need add it outside
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-11-04 16:07:51 +01:00
Daniël de Kok
5eedb2ec7a
nix: move to tgi-nix `main` ( #2718 )
2024-11-04 15:40:13 +01:00
Nicolas Patry
9fde566602
Fixing linting on main. ( #2719 )
2024-11-04 15:21:41 +01:00