hf_text-generation-inference

Author	SHA1	Message	Date
Wang, Yi	b6bb1d5160	Cpu dockerimage (#2367 ) add intel-cpu docker image Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-08-12 14:10:30 +02:00
Daniël de Kok	22fb1be588	Fix cache block size for flash decoding (#2351 ) * Fix cache block size for flash decoding This seems to have been accidentally dropped during the TRT-LLM PR rebase. * Also run CI on changes to `backends`	2024-08-01 15:38:57 +02:00
Daniël de Kok	67ef0649cf	GPTQ CI improvements (#2151 ) * Add more representative Llama GPTQ test The Llama GPTQ test is updated to use a model with the commonly-used quantizer config format and activation sorting. The old test is kept around (but renamed) since it tests the format produced by `text-generation-server quantize`. * Add support for manually triggering a release build	2024-07-05 14:12:16 +02:00
Nicolas Patry	480d3b3304	New runner. Manual squash. (#2110 ) * New runner. Manual squash. * Network host. * Put back trufflehog with proper extension. * No network host ? * Moving buildx install after tailscale ? * 1.79	2024-06-24 18:08:34 +02:00