093a27c528
Add support for GPTQ Marlin kernels GPTQ Marlin extends the Marlin kernels to support common GPTQ configurations: - bits: 4 or 8 - groupsize: -1, 32, 64, or 128 - desc_act: true/false Using the GPTQ Marlin kernels requires repacking the parameters in the Marlin quantizer format. The kernels were contributed by Neural Magic to VLLM. We vendor them here for convenience. |
||
---|---|---|
.. | ||
__init__.pyi | ||
ext.cpp | ||
ext.hh | ||
gptq_marlin.cu | ||
gptq_marlin.cuh | ||
gptq_marlin_dtypes.cuh | ||
gptq_marlin_repack.cu | ||
marlin_cuda_kernel.cu | ||
py.typed |