attention
|
ROCm and sliding windows fixes (#2033)
|
2024-06-10 15:09:50 +08:00 |
awq
|
Refactor layers. (#1866)
|
2024-05-13 12:44:30 +02:00 |
gptq
|
Fix GPTQWeight import (#2020)
|
2024-06-05 14:49:15 +02:00 |
__init__.py
|
MLPSpeculator. (#1865)
|
2024-05-14 12:33:18 +02:00 |
conv.py
|
Refactor layers. (#1866)
|
2024-05-13 12:44:30 +02:00 |
eetq.py
|
Refactor layers. (#1866)
|
2024-05-13 12:44:30 +02:00 |
exl2.py
|
Add support for exl2 quantization
|
2024-05-30 11:28:05 +02:00 |
fp8.py
|
Refactor layers. (#1866)
|
2024-05-13 12:44:30 +02:00 |
layernorm.py
|
MI300 compatibility (#1764)
|
2024-05-17 15:30:47 +02:00 |
linear.py
|
Add support for Marlin-quantized models
|
2024-06-06 13:16:52 +02:00 |
marlin.py
|
disable marlin tests on rocm/xpu
|
2024-06-10 13:06:11 +00:00 |
mlp.py
|
MLPSpeculator. (#1865)
|
2024-05-14 12:33:18 +02:00 |
rotary.py
|
reenable xpu for tgi (#1939)
|
2024-05-23 14:11:08 +02:00 |
speculative.py
|
MLPSpeculator. (#1865)
|
2024-05-14 12:33:18 +02:00 |
tensor_parallel.py
|
Add Phi-3 medium support (#2039)
|
2024-06-10 09:22:29 +02:00 |