hf_text-generation-inference

History

Daniël de Kok 5e0fb46821 Make handling of FP8 scales more consisent (#2666 ) Change `fp8_quantize` so that we can pass around reciprocals everywhere, so scales are always passed around in the checkpoint format. I also noticed that we ignore any input scales that we might have when fbgemm is available. Skip this path if we already have a scale.		2024-10-19 09:05:01 +02:00
..
attention	Break cycle between the attention implementations and KV cache (#2627 )	2024-10-17 14:54:22 +02:00
awq	CI job. Gpt awq 4 (#2665 )	2024-10-18 17:55:53 +02:00
gptq	CI job. Gpt awq 4 (#2665 )	2024-10-18 17:55:53 +02:00
marlin	…
moe	…
__init__.py	…
bnb.py	…
conv.py	…
eetq.py	…
exl2.py	…
fp8.py	Make handling of FP8 scales more consisent (#2666 )	2024-10-19 09:05:01 +02:00
layernorm.py	…
linear.py	…
lora.py	…
medusa.py	…
mlp.py	…
rotary.py	…
speculative.py	…
tensor_parallel.py	…