hf_text-generation-inference/server
Daniël de Kok 74a8a820ad Use FP8 KV cache when specified by compressed-tensors
The compressed-tensors configuration can specify the configuration of
the KV cache as well. Use an FP8 KV cache when the configuration tells
us to do so (all other options and types are ignored for now).
2024-11-20 14:25:50 +00:00
..
custom_kernels All integration tests back everywhere (too many failed CI). (#2428) 2024-08-16 21:19:46 +02:00
exllama_kernels Update ROCM libs and improvements (#2579) 2024-09-30 10:54:32 +02:00
exllamav2_kernels Update ROCM libs and improvements (#2579) 2024-09-30 10:54:32 +02:00
tests feat: prefill chunking (#2600) 2024-10-16 12:49:33 +02:00
text_generation_server Use FP8 KV cache when specified by compressed-tensors 2024-11-20 14:25:50 +00:00
.gitignore Impl simple mamba model (#1480) 2024-02-08 10:19:45 +01:00
Makefile Remove vLLM dependency for CUDA (#2751) 2024-11-17 17:34:50 +01:00
Makefile-awq chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
Makefile-eetq Upgrade EETQ (Fixes the cuda graphs). (#1729) 2024-04-12 08:15:28 +02:00
Makefile-exllamav2 Upgrading exl2. (#2415) 2024-08-14 11:58:08 +02:00
Makefile-flash-att Hotfixing `make install`. (#2008) 2024-06-04 23:34:03 +02:00
Makefile-flash-att-v2 Update ROCM libs and improvements (#2579) 2024-09-30 10:54:32 +02:00
Makefile-flashinfer Prefix test - Different kind of load test to trigger prefix test bugs. (#2490) 2024-09-11 18:10:40 +02:00
Makefile-lorax-punica Enable multiple LoRa adapters (#2010) 2024-06-25 14:46:27 -04:00
Makefile-selective-scan chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
Makefile-vllm Remove vLLM dependency for CUDA (#2751) 2024-11-17 17:34:50 +01:00
README.md chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
poetry.lock Update to moe-kernels 0.7.0 (#2720) 2024-11-19 14:55:29 +01:00
pyproject.toml Update to moe-kernels 0.7.0 (#2720) 2024-11-19 14:55:29 +01:00
requirements_cuda.txt Upgrading our deps. (#2750) 2024-11-15 14:03:27 +01:00
requirements_intel.txt Upgrading our deps. (#2750) 2024-11-15 14:03:27 +01:00
requirements_rocm.txt Upgrading our deps. (#2750) 2024-11-15 14:03:27 +01:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev