Commit Graph

1165 Commits

Author SHA1 Message Date
Morgan Funtowicz 62dba1a878 misc(cmake): use url deps and not git repo 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 52208f5b78 misc(backend): decrease log verbosity in callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 1149186794 feat(backend): expose tokenizer to the GenerationContext to decode token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 1473259f84 feat(backend): add early stopping criteria from TGI stream callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 958c72a44a misc(ffi): remove unused ffi mapping 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 5b7a951389 feat(backend): refactor the callback to handle intermediate and end inference message 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 11c593dc69 feat(backend): make eog clearer on c++ side 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 06424aa9ff feat(backend): correctly handle the max_new_tokens case for is_eos 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 05ff551950 feat(backend): add number of generated tokens in the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 188442f67d misc(lint): make clippy happier 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 31d9254776 feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 7b0a56f40f feat(backend): fix memory leaking on llama_sampler when the decode ends 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 86a2ae6ba2 chore: unsued variables 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 2cdfed94d9 feat(backend): correctly link to shared fmt and spdlog instead of static 2024-11-14 08:42:01 +01:00
Morgan Funtowicz bd8f0f15e1 feat(backend): fix invalid reference to ctx instead of context in release build 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 3e82f14f57 feat(backend): somewhat generates the final infer response 2024-11-14 08:42:01 +01:00
Morgan Funtowicz b50dcddbb8 feat(backend): avoid dropping the boxed stream at the end of the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 612f2f939f feat(backend): bind incoming request to the server 2024-11-14 08:42:01 +01:00
Morgan Funtowicz d4aee42fd8 feat(backend): add logit parameter in the callback fn 2024-11-14 08:42:01 +01:00
Morgan Funtowicz f39edc72ff feat(backend): add mapping for ignore_eos_token stopping criteria 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 3af2c6837c misc(offline): match rework 2024-11-14 08:42:01 +01:00
Morgan Funtowicz d52b4c4978 feat(backend): full rework of the backend internal to safer c++ 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 6a5f6b0755 misc(offline): update offline tester 2024-11-14 08:42:01 +01:00
Morgan Funtowicz b98c635781 feat(backend): entirely rewrite backend 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 611590440d misc(offline): expose more parameters for generate 2024-11-14 08:42:01 +01:00
Morgan Funtowicz dbc5b7a0f7 misc(offline): link correctly 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 0c1dd0ed2b feat(llamacpp): wip explosion 2024-11-14 08:42:01 +01:00
Morgan Funtowicz a316c53255 feat(llamacpp): expose number of threads for the backend when constructing the model 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 179309b364 misc(build): refactor build type detection in cmake 2024-11-14 08:42:01 +01:00
Morgan Funtowicz f0859c247f misc(build): handle different lib destination folder lib/lib64 2024-11-14 08:42:01 +01:00
Morgan Funtowicz e4d803c94e feat(backend): build and link through build.rs 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 355d8a55b4 feat(backend): wip Rust binding 2024-11-14 08:42:01 +01:00
Morgan Funtowicz f9c248657d chore(backend): minor formatting 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 37faeb34b2 feat(backend): expose frequency and repetition penalties 2024-11-14 08:42:01 +01:00
Morgan Funtowicz d4b5be10f9 feat(backend): minor refactor 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 92bb113653 feat(backend): use llama_token as TokenId type 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 45d5a6a8c5 feat(backend): add some initial decoding steps 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 098c66920d feat(backend): tell cmake to build llama-common and link to it 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 0911076320 feat(backend): correctly load llama.cpp model from llama api and not gpt2 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 05ad684676 feat(llamacpp): enable cuda 2024-11-14 08:42:01 +01:00
Morgan Funtowicz fa89d1e613 misc(cmake): wut 2024-11-14 08:42:01 +01:00
Morgan Funtowicz e4432d36b1 misc(cmake): add parameter to build specific cuda arch 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 52d57dca79 feat(llamacpp): initial end2end build 2024-11-14 08:42:01 +01:00
Morgan Funtowicz 7d1f8a2bd6 feat(llamacpp): correctly handle CMAKE_BUILD_TYPE for spdlog macros 2024-11-14 08:42:01 +01:00
Morgan Funtowicz aa1fcba59f feat(llamacpp): initial commit
# Conflicts:
#	Cargo.lock
2024-11-14 08:42:01 +01:00
Daniël de Kok a785000842
Add initial support for compressed-tensors checkpoints (#2732)
compressed-tensors is a safetensors extension for sparse, quantized
tensors. The format is more powerful than earlier AWQ/GPTQ/FP8
quantization, because

- Different quantizer configurations can be used for different targets.
- The format can specify input/output quantizers in addition to weight
  quantizers.
- Configurable exclusions for quantization.

This change adds a dependency on the `compressed-tensors` package for
its configuration parsing and layer matching functionality.

The following types of quantization are supported in this PR:

- W8A16 and W4A16 INT using GPTQ-Marlin kernels.
- W8A8 and W8A16 FP using FP8-Marlin and cutlass kernels.

Support for other quantization types will be added in subsequent PRs.
2024-11-10 13:54:07 +01:00
Wang, Yi 97f7a22f0b
add trust_remote_code in tokenizer to fix baichuan issue (#2725)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-11-07 14:43:38 +01:00
Wang, Yi b1f9044d6c
fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Inst… (#2717)
fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Instruct-AWQ
ipex kernel provide func like add_bias, so no need add it outside

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-11-04 16:07:51 +01:00
Daniël de Kok 5eedb2ec7a
nix: move to tgi-nix `main` (#2718) 2024-11-04 15:40:13 +01:00
Nicolas Patry 9fde566602
Fixing linting on main. (#2719) 2024-11-04 15:21:41 +01:00