hf_text-generation-inference/server/text_generation_server/layers
Daniël de Kok 1c84a30fe6
MoE Marlin: support `desc_act` for `groupsize != -1` (#2590)
This change uses the updated Marlin MoE kernel from vLLM to support
MoE with activation sorting and groups.
2024-09-30 19:40:25 +02:00
..
attention Update ROCM libs and improvements (#2579) 2024-09-30 10:54:32 +02:00
awq
gptq
marlin MoE Marlin: support `desc_act` for `groupsize != -1` (#2590) 2024-09-30 19:40:25 +02:00
moe MoE Marlin: support `desc_act` for `groupsize != -1` (#2590) 2024-09-30 19:40:25 +02:00
__init__.py
bnb.py
conv.py
eetq.py
exl2.py
fp8.py Add support for scalar FP8 weight scales (#2550) 2024-09-24 13:57:40 +02:00
layernorm.py
linear.py Update ROCM libs and improvements (#2579) 2024-09-30 10:54:32 +02:00
lora.py
medusa.py
mlp.py
rotary.py feat: support phi3.5 moe (#2479) 2024-09-30 11:15:09 +02:00
speculative.py
tensor_parallel.py