hf_text-generation-inference

History

Daniël de Kok e903770897 Support different image sizes in prefill in VLMs (#2065 ) When a batch contained images if different sizes during prefill, the server would fail (see e.g. #2056). Images were processed separately and then concatenated. However, this can fail for images with different sizes. Fix this by preprocessing all images in the batch together, so that the image processor can ensure that all image tensors have compatible sizes.		2024-06-17 10:49:41 +02:00
..
test_bloom_560m	feat(server): support vectorized warpers in flash causal lm (#317 )	2023-05-26 12:30:27 +02:00
test_bloom_560m_sharded	feat(integration-tests): improve comparison and health checks (#336 )	2023-05-16 20:22:11 +02:00
test_chat_llama	Fix seeded output. (#1949 )	2024-05-24 15:36:13 +02:00
test_completion_prompts	v2.0.1	2024-04-18 17:20:36 +02:00
test_flash_awq	Add AWQ quantization inference support (#1019 ) (#1054 )	2023-09-25 15:31:27 +02:00
test_flash_awq_sharded	Add AWQ quantization inference support (#1019 ) (#1054 )	2023-09-25 15:31:27 +02:00
test_flash_falcon	feat(server): add retry on download (#384 )	2023-05-31 10:57:53 +02:00
test_flash_gemma	feat: add support for Gemma (#1583 )	2024-02-21 14:15:22 +01:00
test_flash_gemma_gptq	Fix GPTQ for models which do not have float16 at the default dtype (simpler) (#1953 )	2024-05-27 14:41:28 +02:00
test_flash_gpt2	Add GPT-2 with flash attention (#1889 )	2024-05-15 13:31:22 +02:00
test_flash_grammar_llama	fix: correctly index into mask when applying grammar (#1618 )	2024-03-01 18:22:01 +01:00
test_flash_llama	Remove the stripping of the prefix space (and any other mangling that tokenizers might do). (#1065 )	2023-09-27 12:13:45 +02:00
test_flash_llama_exl2	Add support for exl2 quantization	2024-05-30 11:28:05 +02:00
test_flash_llama_gptq	ROCm AWQ support (#1514 )	2024-02-09 10:45:16 +01:00
test_flash_llama_gptq_marlin	Add support for GPTQ Marlin (#2052 )	2024-06-14 09:45:42 +02:00
test_flash_llama_marlin	Add support for Marlin-quantized models	2024-06-06 13:16:52 +02:00
test_flash_medusa	Speculative (#1308 )	2023-12-11 12:46:30 +01:00
test_flash_mistral	feat: add mistral model (#1071 )	2023-09-28 09:55:47 +02:00
test_flash_neox	fix(server): fix init for flash causal lm (#352 )	2023-05-22 15:05:32 +02:00
test_flash_neox_sharded	fix(server): fix init for flash causal lm (#352 )	2023-05-22 15:05:32 +02:00
test_flash_pali_gemma	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_flash_phi	feat: adds phi model (#1442 )	2024-01-25 15:37:53 +01:00
test_flash_qwen2	feat: Qwen2 (#1608 )	2024-02-28 15:50:31 +01:00
test_flash_santacoder	feat(integration-tests): improve comparison and health checks (#336 )	2023-05-16 20:22:11 +02:00
test_flash_starcoder	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_flash_starcoder2	feat: starcoder2 (#1605 )	2024-02-28 12:07:08 +01:00
test_flash_starcoder_gptq	ROCm AWQ support (#1514 )	2024-02-09 10:45:16 +01:00
test_grammar_llama	fix: correctly index into mask when applying grammar (#1618 )	2024-03-01 18:22:01 +01:00
test_grammar_response_format_llama	Support chat response format (#2046 )	2024-06-11 10:44:56 -04:00
test_idefics	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_idefics2	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_llava_next	Idefics2. (#1756 )	2024-04-23 23:04:44 +02:00
test_mamba	Improving mamba runtime by using updates (#1552 )	2024-02-14 09:54:10 +01:00
test_mpt	feat(server): Add Non flash MPT. (#514 )	2023-07-03 13:01:46 +02:00
test_mt0_base	Adding Llava-Next (Llava 1.6) with full support. (#1709 )	2024-04-09 21:32:00 +02:00
test_neox	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_neox_sharded	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_t5_sharded	feat(server): support fp16 for t5 (#360 )	2023-05-23 18:16:48 +02:00
test_tools_llama	v2.0.1	2024-04-18 17:20:36 +02:00