hf_text-generation-inference/integration-tests/models
Daniël de Kok e903770897
Support different image sizes in prefill in VLMs (#2065)
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. #2056). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.

Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
2024-06-17 10:49:41 +02:00
..
__snapshots__ Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00
test_bloom_560m.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_bloom_560m_sharded.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_chat_llama.py Fix seeded output. (#1949) 2024-05-24 15:36:13 +02:00
test_completion_prompts.py feat: accept list as prompt and use first string (#1702) 2024-04-17 10:41:12 +02:00
test_flash_awq.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_awq_sharded.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_falcon.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_gemma.py Fix (flash) Gemma prefix and enable tests 2024-05-27 09:58:06 +02:00
test_flash_gemma_gptq.py Gemma GPTQ checks: skip logprob checks 2024-05-30 11:28:05 +02:00
test_flash_gpt2.py Add GPT-2 with flash attention (#1889) 2024-05-15 13:31:22 +02:00
test_flash_grammar_llama.py fix: correctly index into mask when applying grammar (#1618) 2024-03-01 18:22:01 +01:00
test_flash_llama.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_llama_exl2.py Add support for exl2 quantization 2024-05-30 11:28:05 +02:00
test_flash_llama_gptq.py feat: add cuda memory fraction (#659) 2023-07-24 11:43:58 +02:00
test_flash_llama_gptq_marlin.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
test_flash_llama_marlin.py Add support for Marlin-quantized models 2024-06-06 13:16:52 +02:00
test_flash_medusa.py Revamp medusa implementation so that every model can benefit. (#1588) 2024-02-26 19:49:28 +01:00
test_flash_mistral.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_neox.py feat(server): add paged attention to flash models (#516) 2023-06-30 19:09:59 +02:00
test_flash_neox_sharded.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_pali_gemma.py Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00
test_flash_phi.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_qwen2.py feat: Qwen2 (#1608) 2024-02-28 15:50:31 +01:00
test_flash_santacoder.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_starcoder.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_starcoder2.py feat: starcoder2 (#1605) 2024-02-28 12:07:08 +01:00
test_flash_starcoder_gptq.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_grammar_llama.py fix: correctly index into mask when applying grammar (#1618) 2024-03-01 18:22:01 +01:00
test_grammar_response_format_llama.py Support chat response format (#2046) 2024-06-11 10:44:56 -04:00
test_idefics.py Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00
test_idefics2.py Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00
test_llava_next.py Adding Llava-Next (Llava 1.6) with full support. (#1709) 2024-04-09 21:32:00 +02:00
test_mamba.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_mpt.py feat(server): Add Non flash MPT. (#514) 2023-07-03 13:01:46 +02:00
test_mt0_base.py Adding Llava-Next (Llava 1.6) with full support. (#1709) 2024-04-09 21:32:00 +02:00
test_neox.py feat(server): Rework model loading (#344) 2023-06-08 14:51:52 +02:00
test_neox_sharded.py feat(server): Rework model loading (#344) 2023-06-08 14:51:52 +02:00
test_t5_sharded.py Improve the defaults for the launcher (#1727) 2024-04-12 14:20:31 +02:00
test_tools_llama.py feat: improve tools to include name and add tests (#1693) 2024-04-16 09:02:46 -04:00