hf_text-generation-inference

fxmarty 1846c1c210 fix tests	2024-06-24 18:50:18 +02:00
..
__snapshots__	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_bloom_560m.py	fix tests	2024-06-24 18:50:18 +02:00
test_bloom_560m_sharded.py	fix tests	2024-06-24 18:50:18 +02:00
test_chat_llama.py	Fix seeded output. (#1949 )	2024-05-24 15:36:13 +02:00
test_completion_prompts.py	feat: accept list as prompt and use first string (#1702 )	2024-04-17 10:41:12 +02:00
test_flash_awq.py	skip exl2 tests on rocm	2024-06-24 18:49:45 +02:00
test_flash_awq_sharded.py	update	2024-06-24 18:50:17 +02:00
test_flash_falcon.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_gemma.py	fix tests	2024-06-24 18:50:18 +02:00
test_flash_gemma_gptq.py	fix tests	2024-06-24 18:50:18 +02:00
test_flash_gpt2.py	Add GPT-2 with flash attention (#1889 )	2024-05-15 13:31:22 +02:00
test_flash_grammar_llama.py	fix: correctly index into mask when applying grammar (#1618 )	2024-03-01 18:22:01 +01:00
test_flash_llama.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_llama_exl2.py	skip exl2 tests on rocm	2024-06-24 18:49:45 +02:00
test_flash_llama_gptq.py	update	2024-06-24 18:50:17 +02:00
test_flash_llama_gptq_marlin.py	Add support for GPTQ Marlin (#2052 )	2024-06-14 09:45:42 +02:00
test_flash_llama_marlin.py	fix gptq tests, LLMM1 matrix bound	2024-06-24 18:49:45 +02:00
test_flash_medusa.py	Revamp medusa implementation so that every model can benefit. (#1588 )	2024-02-26 19:49:28 +01:00
test_flash_mistral.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_neox.py	feat(server): add paged attention to flash models (#516 )	2023-06-30 19:09:59 +02:00
test_flash_neox_sharded.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_pali_gemma.py	fix tests	2024-06-24 18:50:18 +02:00
test_flash_phi.py	fix tests	2024-06-24 18:50:18 +02:00
test_flash_qwen2.py	feat: Qwen2 (#1608 )	2024-02-28 15:50:31 +01:00
test_flash_santacoder.py	fix tests	2024-06-24 18:50:18 +02:00
test_flash_starcoder.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_starcoder2.py	feat: starcoder2 (#1605 )	2024-02-28 12:07:08 +01:00
test_flash_starcoder_gptq.py	fix tests	2024-06-24 18:50:18 +02:00
test_grammar_llama.py	fix: correctly index into mask when applying grammar (#1618 )	2024-03-01 18:22:01 +01:00
test_grammar_response_format_llama.py	Support chat response format (#2046 )	2024-06-11 10:44:56 -04:00
test_idefics.py	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_idefics2.py	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_llava_next.py	fix tests	2024-06-24 18:50:18 +02:00
test_mamba.py	fix tests	2024-06-24 18:50:18 +02:00
test_mpt.py	feat(server): Add Non flash MPT. (#514 )	2023-07-03 13:01:46 +02:00
test_mt0_base.py	fix tests	2024-06-24 18:50:18 +02:00
test_neox.py	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_neox_sharded.py	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_t5_sharded.py	Improve the defaults for the launcher (#1727 )	2024-04-12 14:20:31 +02:00
test_tools_llama.py	feat: improve tools to include name and add tests (#1693 )	2024-04-16 09:02:46 -04:00
testing_utils.py	update	2024-06-24 18:50:17 +02:00

__snapshots__

Support different image sizes in prefill in VLMs (#2065 )

2024-06-17 10:49:41 +02:00

test_bloom_560m.py

fix tests

2024-06-24 18:50:18 +02:00

test_bloom_560m_sharded.py

fix tests

2024-06-24 18:50:18 +02:00

test_chat_llama.py

Fix seeded output. (#1949 )

2024-05-24 15:36:13 +02:00

test_completion_prompts.py

feat: accept list as prompt and use first string (#1702 )

2024-04-17 10:41:12 +02:00

test_flash_awq.py

skip exl2 tests on rocm

2024-06-24 18:49:45 +02:00

test_flash_awq_sharded.py

update

2024-06-24 18:50:17 +02:00

test_flash_falcon.py

feat(server): only compute prefill logprobs when asked (#406 )

2023-06-02 17:12:30 +02:00

test_flash_gemma.py

fix tests

2024-06-24 18:50:18 +02:00

test_flash_gemma_gptq.py

fix tests

2024-06-24 18:50:18 +02:00

test_flash_gpt2.py

Add GPT-2 with flash attention (#1889 )

2024-05-15 13:31:22 +02:00

test_flash_grammar_llama.py

fix: correctly index into mask when applying grammar (#1618 )

2024-03-01 18:22:01 +01:00

test_flash_llama.py

feat(server): only compute prefill logprobs when asked (#406 )

2023-06-02 17:12:30 +02:00

test_flash_llama_exl2.py

skip exl2 tests on rocm

2024-06-24 18:49:45 +02:00

test_flash_llama_gptq.py

update

2024-06-24 18:50:17 +02:00

test_flash_llama_gptq_marlin.py

Add support for GPTQ Marlin (#2052 )

2024-06-14 09:45:42 +02:00

test_flash_llama_marlin.py

fix gptq tests, LLMM1 matrix bound

2024-06-24 18:49:45 +02:00

test_flash_medusa.py

Revamp medusa implementation so that every model can benefit. (#1588 )

2024-02-26 19:49:28 +01:00

test_flash_mistral.py

fix(router): fix openapi and add jsonschema validation (#1578 )

2024-02-21 11:05:32 +01:00

test_flash_neox.py

feat(server): add paged attention to flash models (#516 )

2023-06-30 19:09:59 +02:00

test_flash_neox_sharded.py

feat(server): only compute prefill logprobs when asked (#406 )

2023-06-02 17:12:30 +02:00

test_flash_pali_gemma.py

fix tests

2024-06-24 18:50:18 +02:00

test_flash_phi.py

fix tests

2024-06-24 18:50:18 +02:00

test_flash_qwen2.py

feat: Qwen2 (#1608 )

2024-02-28 15:50:31 +01:00

test_flash_santacoder.py

fix tests

2024-06-24 18:50:18 +02:00

test_flash_starcoder.py

feat(server): only compute prefill logprobs when asked (#406 )

2023-06-02 17:12:30 +02:00

test_flash_starcoder2.py

feat: starcoder2 (#1605 )

2024-02-28 12:07:08 +01:00

test_flash_starcoder_gptq.py

fix tests

2024-06-24 18:50:18 +02:00

test_grammar_llama.py

fix: correctly index into mask when applying grammar (#1618 )

2024-03-01 18:22:01 +01:00

test_grammar_response_format_llama.py

Support chat response format (#2046 )

2024-06-11 10:44:56 -04:00

test_idefics.py

Support different image sizes in prefill in VLMs (#2065 )

2024-06-17 10:49:41 +02:00

test_idefics2.py

Support different image sizes in prefill in VLMs (#2065 )

2024-06-17 10:49:41 +02:00

test_llava_next.py

fix tests

2024-06-24 18:50:18 +02:00

test_mamba.py

fix tests

2024-06-24 18:50:18 +02:00

test_mpt.py

feat(server): Add Non flash MPT. (#514 )

2023-07-03 13:01:46 +02:00

test_mt0_base.py

fix tests

2024-06-24 18:50:18 +02:00

test_neox.py

feat(server): Rework model loading (#344 )

2023-06-08 14:51:52 +02:00

test_neox_sharded.py

feat(server): Rework model loading (#344 )

2023-06-08 14:51:52 +02:00

test_t5_sharded.py

Improve the defaults for the launcher (#1727 )

2024-04-12 14:20:31 +02:00

test_tools_llama.py

feat: improve tools to include name and add tests (#1693 )

2024-04-16 09:02:46 -04:00

testing_utils.py

update

2024-06-24 18:50:17 +02:00