History

Daniël de Kok 72ab60fdd5 Use FP8 KV cache when specified by compressed-tensors (#2761 ) The compressed-tensors configuration can specify the configuration of the KV cache as well. Use an FP8 KV cache when the configuration tells us to do so (all other options and types are ignored for now).		2024-11-26 08:27:41 +01:00
..
custom_kernels	All integration tests back everywhere (too many failed CI). (#2428 )	2024-08-16 21:19:46 +02:00
exllama_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
exllamav2_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
tests	feat: prefill chunking (#2600 )	2024-10-16 12:49:33 +02:00
text_generation_server	Use FP8 KV cache when specified by compressed-tensors (#2761 )	2024-11-26 08:27:41 +01:00
.gitignore	Impl simple mamba model (#1480 )	2024-02-08 10:19:45 +01:00
Makefile	Remove vLLM dependency for CUDA (#2751 )	2024-11-17 17:34:50 +01:00
Makefile-awq	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-eetq	Upgrade EETQ (Fixes the cuda graphs). (#1729 )	2024-04-12 08:15:28 +02:00
Makefile-exllamav2	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
Makefile-flash-att	Hotfixing `make install`. (#2008 )	2024-06-04 23:34:03 +02:00
Makefile-flash-att-v2	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
Makefile-flashinfer	Prefix test - Different kind of load test to trigger prefix test bugs. (#2490 )	2024-09-11 18:10:40 +02:00
Makefile-lorax-punica	Enable multiple LoRa adapters (#2010 )	2024-06-25 14:46:27 -04:00
Makefile-selective-scan	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-vllm	Remove vLLM dependency for CUDA (#2751 )	2024-11-17 17:34:50 +01:00
README.md	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
poetry.lock	chore: Update to marlin-kernels 0.3.6 (#2771 )	2024-11-22 14:44:47 +00:00
pyproject.toml	chore: Update to marlin-kernels 0.3.6 (#2771 )	2024-11-22 14:44:47 +00:00
requirements_cuda.txt	Upgrading our deps. (#2750 )	2024-11-15 14:03:27 +01:00
requirements_intel.txt	Upgrading our deps. (#2750 )	2024-11-15 14:03:27 +01:00
requirements_rocm.txt	Upgrading our deps. (#2750 )	2024-11-15 14:03:27 +01:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev