Mohit Sharma
704a58c807
Fp8 e4m3_fnuz support for rocm ( #2588 )
...
* (feat) fp8 fnuz support for rocm
* (review comments) Fix compression_config load, type hints
* (bug) update all has_tensor
* (review_comments) fix typo and added comments
* (nit) improved comment
2024-10-16 09:54:50 +02:00
Daniël de Kok
64142489b6
Add support for fused MoE Marlin for AWQ ( #2616 )
...
* Add support for fused MoE Marlin for AWQ
This uses the updated MoE Marlin kernels from vLLM.
* Add integration test for AWQ MoE
2024-10-08 11:56:41 +02:00
Daniël de Kok
1c84a30fe6
MoE Marlin: support `desc_act` for `groupsize != -1` ( #2590 )
...
This change uses the updated Marlin MoE kernel from vLLM to support
MoE with activation sorting and groups.
2024-09-30 19:40:25 +02:00
Daniël de Kok
34f7dcfd80
Handle GPTQ-Marlin loading in `GPTQMarlinWeightLoader` ( #2300 )
...
The `GPTWeightLoader` was structured like this in pseudocode:
if marlin:
Set up tensors in a way that GPTQ-Marlin expects
else:
Set up tensors in a way that ExLlama/GPTQ/AWQ expect
However, the GPT-Marlin implementation details should really be in the
`marlin` module. So move the former part out to a separate
`GPTQMarlinWeightsLoader`.
2024-07-31 13:08:41 +02:00
Daniël de Kok
922732b255
Install Marlin from standalone package ( #2320 )
2024-07-29 15:37:10 +02:00
drbh
bab02ff2bc
feat: add ruff and resolve issue ( #2262 )
...
* feat: add ruff and resolve issue
* fix: update client exports and adjust after rebase
* fix: adjust syntax to avoid circular import
* fix: adjust client ruff settings
* fix: lint and refactor import check and avoid model enum as global names
* fix: improve fbgemm_gpu check and lints
* fix: update lints
* fix: prefer comparing model enum over str
* fix: adjust lints and ignore specific rules
* fix: avoid unneeded quantize check
2024-07-26 10:29:09 -04:00
Daniël de Kok
93d2b9fe9c
Split up `layers.marlin` into several files ( #2292 )
...
The marlin.py file was getting large, split it up.
2024-07-24 16:33:26 +02:00