hf_text-generation-inference/server/text_generation_server/utils
Daniël de Kok 093a27c528
Add support for GPTQ Marlin (#2052)
Add support for GPTQ Marlin kernels

GPTQ Marlin extends the Marlin kernels to support common GPTQ
configurations:

- bits: 4 or 8
- groupsize: -1, 32, 64, or 128
- desc_act: true/false

Using the GPTQ Marlin kernels requires repacking the parameters in the
Marlin quantizer format.

The kernels were contributed by Neural Magic to VLLM. We vendor them
here for convenience.
2024-06-14 09:45:42 +02:00
..
__init__.py
chunks.py server: use chunked inputs 2024-06-07 08:09:04 +02:00
convert.py Force weights_only (before fully breaking pickle files anyway). (#1710) 2024-04-05 19:23:57 +02:00
dist.py add intel xpu support for TGI (#1475) 2024-04-26 15:48:58 +02:00
hub.py Fixing the download strategy for ibm-fms (#1917) 2024-05-18 13:31:24 +02:00
import_utils.py Purely refactors paged/attention into `layers/attention` and make hardware differences more obvious with 1 file per hardware. (#1986) 2024-05-31 17:57:01 +02:00
log.py
logits_process.py Fixing frequency penalty (#1811) 2024-04-30 12:13:23 +02:00
peft.py
speculate.py
tokens.py Use the generation config. (#1808) 2024-04-25 19:41:50 +02:00
watermark.py
weights.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00