Commit Graph

6 Commits

Author SHA1 Message Date
Mohit Sharma 704a58c807
Fp8 e4m3_fnuz support for rocm (#2588)
* (feat) fp8 fnuz support for rocm

* (review comments) Fix compression_config load, type hints

* (bug) update all has_tensor

* (review_comments) fix typo and added comments

* (nit) improved comment
2024-10-16 09:54:50 +02:00
Daniël de Kok 64142489b6
Add support for fused MoE Marlin for AWQ (#2616)
* Add support for fused MoE Marlin for AWQ

This uses the updated MoE Marlin kernels from vLLM.

* Add integration test for AWQ MoE
2024-10-08 11:56:41 +02:00
Daniël de Kok 1c84a30fe6
MoE Marlin: support `desc_act` for `groupsize != -1` (#2590)
This change uses the updated Marlin MoE kernel from vLLM to support
MoE with activation sorting and groups.
2024-09-30 19:40:25 +02:00
Daniël de Kok 34f7dcfd80
Handle GPTQ-Marlin loading in `GPTQMarlinWeightLoader` (#2300)
The `GPTWeightLoader` was structured like this in pseudocode:

if marlin:
  Set up tensors in a way that GPTQ-Marlin expects
else:
  Set up tensors in a way that ExLlama/GPTQ/AWQ expect

However, the GPT-Marlin implementation details should really be in the
`marlin` module. So move the former part out to a separate
`GPTQMarlinWeightsLoader`.
2024-07-31 13:08:41 +02:00
Daniël de Kok 922732b255
Install Marlin from standalone package (#2320) 2024-07-29 15:37:10 +02:00
Daniël de Kok 93d2b9fe9c
Split up `layers.marlin` into several files (#2292)
The marlin.py file was getting large, split it up.
2024-07-24 16:33:26 +02:00