hf_text-generation-inference

History

Daniël de Kok e903770897 Support different image sizes in prefill in VLMs (#2065 ) When a batch contained images if different sizes during prefill, the server would fail (see e.g. #2056). Images were processed separately and then concatenated. However, this can fail for images with different sizes. Fix this by preprocessing all images in the batch together, so that the image processor can ensure that all image tensors have compatible sizes.		2024-06-17 10:49:41 +02:00
..
__snapshots__	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_bloom_560m.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_bloom_560m_sharded.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_chat_llama.py	Fix seeded output. (#1949 )	2024-05-24 15:36:13 +02:00
test_completion_prompts.py	feat: accept list as prompt and use first string (#1702 )	2024-04-17 10:41:12 +02:00
test_flash_awq.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_awq_sharded.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_falcon.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_gemma.py	Fix (flash) Gemma prefix and enable tests	2024-05-27 09:58:06 +02:00
test_flash_gemma_gptq.py	Gemma GPTQ checks: skip logprob checks	2024-05-30 11:28:05 +02:00
test_flash_gpt2.py	Add GPT-2 with flash attention (#1889 )	2024-05-15 13:31:22 +02:00
test_flash_grammar_llama.py	fix: correctly index into mask when applying grammar (#1618 )	2024-03-01 18:22:01 +01:00
test_flash_llama.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_llama_exl2.py	Add support for exl2 quantization	2024-05-30 11:28:05 +02:00
test_flash_llama_gptq.py	feat: add cuda memory fraction (#659 )	2023-07-24 11:43:58 +02:00
test_flash_llama_gptq_marlin.py	Add support for GPTQ Marlin (#2052 )	2024-06-14 09:45:42 +02:00
test_flash_llama_marlin.py	Add support for Marlin-quantized models	2024-06-06 13:16:52 +02:00
test_flash_medusa.py	Revamp medusa implementation so that every model can benefit. (#1588 )	2024-02-26 19:49:28 +01:00
test_flash_mistral.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_neox.py	feat(server): add paged attention to flash models (#516 )	2023-06-30 19:09:59 +02:00
test_flash_neox_sharded.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_pali_gemma.py	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_flash_phi.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_qwen2.py	feat: Qwen2 (#1608 )	2024-02-28 15:50:31 +01:00
test_flash_santacoder.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_starcoder.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_starcoder2.py	feat: starcoder2 (#1605 )	2024-02-28 12:07:08 +01:00
test_flash_starcoder_gptq.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_grammar_llama.py	fix: correctly index into mask when applying grammar (#1618 )	2024-03-01 18:22:01 +01:00
test_grammar_response_format_llama.py	Support chat response format (#2046 )	2024-06-11 10:44:56 -04:00
test_idefics.py	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_idefics2.py	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_llava_next.py	Adding Llava-Next (Llava 1.6) with full support. (#1709 )	2024-04-09 21:32:00 +02:00
test_mamba.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_mpt.py	feat(server): Add Non flash MPT. (#514 )	2023-07-03 13:01:46 +02:00
test_mt0_base.py	Adding Llava-Next (Llava 1.6) with full support. (#1709 )	2024-04-09 21:32:00 +02:00
test_neox.py	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_neox_sharded.py	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_t5_sharded.py	Improve the defaults for the launcher (#1727 )	2024-04-12 14:20:31 +02:00
test_tools_llama.py	feat: improve tools to include name and add tests (#1693 )	2024-04-16 09:02:46 -04:00