hf_text-generation-inference/server/text_generation_server/utils
Daniël de Kok 093a27c528
Add support for GPTQ Marlin (#2052)
Add support for GPTQ Marlin kernels

GPTQ Marlin extends the Marlin kernels to support common GPTQ
configurations:

- bits: 4 or 8
- groupsize: -1, 32, 64, or 128
- desc_act: true/false

Using the GPTQ Marlin kernels requires repacking the parameters in the
Marlin quantizer format.

The kernels were contributed by Neural Magic to VLLM. We vendor them
here for convenience.
2024-06-14 09:45:42 +02:00
..
__init__.py feat(server): Add native support for PEFT Lora models (#762) 2023-08-03 17:22:45 +02:00
chunks.py server: use chunked inputs 2024-06-07 08:09:04 +02:00
convert.py Force weights_only (before fully breaking pickle files anyway). (#1710) 2024-04-05 19:23:57 +02:00
dist.py add intel xpu support for TGI (#1475) 2024-04-26 15:48:58 +02:00
hub.py Fixing the download strategy for ibm-fms (#1917) 2024-05-18 13:31:24 +02:00
import_utils.py Purely refactors paged/attention into `layers/attention` and make hardware differences more obvious with 1 file per hardware. (#1986) 2024-05-31 17:57:01 +02:00
log.py v1.3.4 2023-12-22 15:46:04 +01:00
logits_process.py Fixing frequency penalty (#1811) 2024-04-30 12:13:23 +02:00
peft.py fix: fix local loading for .bin models (#1419) 2024-01-09 15:21:00 +01:00
speculate.py chore: formatting 2023-12-11 14:49:52 +01:00
tokens.py Use the generation config. (#1808) 2024-04-25 19:41:50 +02:00
watermark.py Fixing watermark. (#851) 2023-08-16 07:17:26 +02:00
weights.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00