History

fxmarty 2967b8168c fix post refactor		2024-07-16 15:16:27 +02:00
..
custom_kernels	…
exllama_kernels	…
exllamav2_kernels	…
marlin	Add support for FP8 on compute capability >=8.0, <8.9 (#2213 )	2024-07-11 16:03:26 +02:00
tests	Move quantized weight handling out of the `Weights` class (#2194 )	2024-07-09 20:04:03 +02:00
text_generation_server	fix post refactor	2024-07-16 15:16:27 +02:00
.gitignore	…
Makefile	fix: Remove bitsandbytes installation when running cpu-only install (#2216 )	2024-07-15 15:34:20 +02:00
Makefile-awq	…
Makefile-eetq	…
Makefile-flash-att	Hotfixing `make install`. (#2008 )	2024-06-04 23:34:03 +02:00
Makefile-flash-att-v2	Hotfixing `make install`. (#2008 )	2024-06-04 23:34:03 +02:00
Makefile-lorax-punica	Enable multiple LoRa adapters (#2010 )	2024-06-25 14:46:27 -04:00
Makefile-selective-scan	…
Makefile-vllm	fix gptq tests, LLMM1 matrix bound	2024-06-24 18:49:45 +02:00
README.md	…
poetry.lock	Making `make install` work better by default. (#2004 )	2024-06-04 19:38:46 +02:00
pyproject.toml	Making `make install` work better by default. (#2004 )	2024-06-04 19:38:46 +02:00
requirements_cuda.txt	…
requirements_intel.txt	…
requirements_rocm.txt	…

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

make run-dev