650fea1834
Tested with ``` CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq EXLLAMA_VERSION=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq CUDA_VISIBLE_DEVICES="0,1" text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq ``` all with good and identical results on MI210. --------- Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: OlivierDehaene <olivier@huggingface.co> Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com> |
||
---|---|---|
.. | ||
custom_kernels | ||
exllama_kernels | ||
exllamav2_kernels | ||
tests | ||
text_generation_server | ||
.gitignore | ||
Makefile | ||
Makefile-awq | ||
Makefile-eetq | ||
Makefile-flash-att | ||
Makefile-flash-att-v2 | ||
Makefile-vllm | ||
README.md | ||
poetry.lock | ||
pyproject.toml | ||
requirements_common.txt | ||
requirements_cuda.txt | ||
requirements_rocm.txt |
README.md
Text Generation Inference Python gRPC Server
A Python gRPC server for Text Generation Inference
Install
make install
Run
make run-dev