Commit Graph

7 Commits

Author SHA1 Message Date
Felix Marty 00cc73b7b7 fix post merge 2024-07-01 12:20:29 +00:00
Felix Marty 9fd395fae4 fix tests 2024-07-01 12:12:26 +00:00
fxmarty 227f78f3fe Merge branch 'main' into ci_amd3 2024-06-26 12:08:42 +02:00
Daniël de Kok fc9c3153e5
Add pytest release marker (#2114)
* Add pytest release marker

Annotate a test with `@pytest.mark.release` and it only gets run
with `pytest integration-tests --release`.

* Mark many models as `release` to speed up CI
2024-06-25 16:53:20 +02:00
fxmarty 5a4b798f98
fix gptq tests, LLMM1 matrix bound 2024-06-24 18:49:45 +02:00
fxmarty 49db30a137
disable marlin tests on rocm/xpu 2024-06-24 18:49:37 +02:00
Daniël de Kok 4594e6faba Add support for Marlin-quantized models
This change adds support for Marlin-quantized models. Marlin is an
FP16xINT4 matmul kernel, which provides good speedups decoding batches
of 16-32 tokens. It supports quantized models with symmetric
quantization, groupsize -1 or 128, and 4-bit.

Tested with:

- Llama 2
- Llama 3
- Phi 3
2024-06-06 13:16:52 +02:00