hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
Daniël de Kok	90a1d04a2f	Add support for GPTQ-quantized MoE models using MoE Marlin (#2557 ) This change add support for MoE models that use GPTQ quantization. Currently only models with the following properties are supported: - No `desc_act` with tensor parallelism, unless `group_size=-1`. - No asymmetric quantization. - No AWQ.	2024-09-30 11:14:32 +02:00

Author

SHA1

Message

Date

Daniël de Kok

90a1d04a2f

Add support for GPTQ-quantized MoE models using MoE Marlin (#2557 )

This change add support for MoE models that use GPTQ quantization.
Currently only models with the following properties are supported:

- No `desc_act` with tensor parallelism, unless `group_size=-1`.
- No asymmetric quantization.
- No AWQ.

2024-09-30 11:14:32 +02:00

1 Commits