hf_text-generation-inference/server
Nicolas Patry b378fb4702
Fixing exl2 (by disabling cuda graphs)
2024-08-14 19:44:54 +02:00
..
custom_kernels
exllama_kernels
exllamav2_kernels
tests feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
text_generation_server Fixing import exl2 2024-08-12 12:23:46 +02:00
.gitignore
Makefile hotfix: update nccl 2024-07-23 23:31:28 +02:00
Makefile-awq
Makefile-eetq
Makefile-fbgemm chore: update to torch 2.4 (#2259) 2024-07-23 20:39:43 +00:00
Makefile-flash-att
Makefile-flash-att-v2
Makefile-lorax-punica
Makefile-selective-scan
Makefile-vllm
README.md
poetry.lock Fixing exl2 (by disabling cuda graphs) 2024-08-14 19:44:54 +02:00
pyproject.toml Fixing exl2 (by disabling cuda graphs) 2024-08-14 19:44:54 +02:00
requirements_cuda.txt Fixing exl2 (by disabling cuda graphs) 2024-08-14 19:44:54 +02:00
requirements_intel.txt Fixing exl2 (by disabling cuda graphs) 2024-08-14 19:44:54 +02:00
requirements_rocm.txt Fixing exl2 (by disabling cuda graphs) 2024-08-14 19:44:54 +02:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev