hf_text-generation-inference/integration-tests/models/__snapshots__
Daniël de Kok e903770897
Support different image sizes in prefill in VLMs (#2065)
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. #2056). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.

Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
2024-06-17 10:49:41 +02:00
..
test_bloom_560m feat(server): support vectorized warpers in flash causal lm (#317) 2023-05-26 12:30:27 +02:00
test_bloom_560m_sharded feat(integration-tests): improve comparison and health checks (#336) 2023-05-16 20:22:11 +02:00
test_chat_llama Fix seeded output. (#1949) 2024-05-24 15:36:13 +02:00
test_completion_prompts v2.0.1 2024-04-18 17:20:36 +02:00
test_flash_awq Add AWQ quantization inference support (#1019) (#1054) 2023-09-25 15:31:27 +02:00
test_flash_awq_sharded Add AWQ quantization inference support (#1019) (#1054) 2023-09-25 15:31:27 +02:00
test_flash_falcon feat(server): add retry on download (#384) 2023-05-31 10:57:53 +02:00
test_flash_gemma feat: add support for Gemma (#1583) 2024-02-21 14:15:22 +01:00
test_flash_gemma_gptq Fix GPTQ for models which do not have float16 at the default dtype (simpler) (#1953) 2024-05-27 14:41:28 +02:00
test_flash_gpt2 Add GPT-2 with flash attention (#1889) 2024-05-15 13:31:22 +02:00
test_flash_grammar_llama fix: correctly index into mask when applying grammar (#1618) 2024-03-01 18:22:01 +01:00
test_flash_llama Remove the stripping of the prefix space (and any other mangling that tokenizers might do). (#1065) 2023-09-27 12:13:45 +02:00
test_flash_llama_exl2 Add support for exl2 quantization 2024-05-30 11:28:05 +02:00
test_flash_llama_gptq ROCm AWQ support (#1514) 2024-02-09 10:45:16 +01:00
test_flash_llama_gptq_marlin Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
test_flash_llama_marlin Add support for Marlin-quantized models 2024-06-06 13:16:52 +02:00
test_flash_medusa Speculative (#1308) 2023-12-11 12:46:30 +01:00
test_flash_mistral feat: add mistral model (#1071) 2023-09-28 09:55:47 +02:00
test_flash_neox fix(server): fix init for flash causal lm (#352) 2023-05-22 15:05:32 +02:00
test_flash_neox_sharded fix(server): fix init for flash causal lm (#352) 2023-05-22 15:05:32 +02:00
test_flash_pali_gemma Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00
test_flash_phi feat: adds phi model (#1442) 2024-01-25 15:37:53 +01:00
test_flash_qwen2 feat: Qwen2 (#1608) 2024-02-28 15:50:31 +01:00
test_flash_santacoder feat(integration-tests): improve comparison and health checks (#336) 2023-05-16 20:22:11 +02:00
test_flash_starcoder feat(server): Rework model loading (#344) 2023-06-08 14:51:52 +02:00
test_flash_starcoder2 feat: starcoder2 (#1605) 2024-02-28 12:07:08 +01:00
test_flash_starcoder_gptq ROCm AWQ support (#1514) 2024-02-09 10:45:16 +01:00
test_grammar_llama fix: correctly index into mask when applying grammar (#1618) 2024-03-01 18:22:01 +01:00
test_grammar_response_format_llama Support chat response format (#2046) 2024-06-11 10:44:56 -04:00
test_idefics Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00
test_idefics2 Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00
test_llava_next Idefics2. (#1756) 2024-04-23 23:04:44 +02:00
test_mamba Improving mamba runtime by using updates (#1552) 2024-02-14 09:54:10 +01:00
test_mpt feat(server): Add Non flash MPT. (#514) 2023-07-03 13:01:46 +02:00
test_mt0_base Adding Llava-Next (Llava 1.6) with full support. (#1709) 2024-04-09 21:32:00 +02:00
test_neox feat(server): Rework model loading (#344) 2023-06-08 14:51:52 +02:00
test_neox_sharded feat(server): Rework model loading (#344) 2023-06-08 14:51:52 +02:00
test_t5_sharded feat(server): support fp16 for t5 (#360) 2023-05-23 18:16:48 +02:00
test_tools_llama v2.0.1 2024-04-18 17:20:36 +02:00