hf_text-generation-inference/server
fxmarty 2967b8168c fix post refactor 2024-07-16 15:16:27 +02:00
..
custom_kernels
exllama_kernels
exllamav2_kernels
marlin Add support for FP8 on compute capability >=8.0, <8.9 (#2213) 2024-07-11 16:03:26 +02:00
tests Move quantized weight handling out of the `Weights` class (#2194) 2024-07-09 20:04:03 +02:00
text_generation_server fix post refactor 2024-07-16 15:16:27 +02:00
.gitignore
Makefile fix: Remove bitsandbytes installation when running cpu-only install (#2216) 2024-07-15 15:34:20 +02:00
Makefile-awq
Makefile-eetq
Makefile-flash-att Hotfixing `make install`. (#2008) 2024-06-04 23:34:03 +02:00
Makefile-flash-att-v2 Hotfixing `make install`. (#2008) 2024-06-04 23:34:03 +02:00
Makefile-lorax-punica Enable multiple LoRa adapters (#2010) 2024-06-25 14:46:27 -04:00
Makefile-selective-scan
Makefile-vllm fix gptq tests, LLMM1 matrix bound 2024-06-24 18:49:45 +02:00
README.md
poetry.lock Making `make install` work better by default. (#2004) 2024-06-04 19:38:46 +02:00
pyproject.toml Making `make install` work better by default. (#2004) 2024-06-04 19:38:46 +02:00
requirements_cuda.txt
requirements_intel.txt
requirements_rocm.txt

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev