hf_text-generation-inference/server/text_generation_server/models
Daniël de Kok e903770897
Support different image sizes in prefill in VLMs (#2065)
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. #2056). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.

Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
2024-06-17 10:49:41 +02:00
..
custom_modeling Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
__init__.py Update the link for qwen2 (#2068) 2024-06-14 11:59:33 +02:00
bloom.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
causal_lm.py server: use chunked inputs 2024-06-07 08:09:04 +02:00
flash_causal_lm.py ROCm and sliding windows fixes (#2033) 2024-06-10 15:09:50 +08:00
flash_cohere.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_dbrx.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_gemma.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_gpt2.py Add support for Marlin-quantized models 2024-06-06 13:16:52 +02:00
flash_llama.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_mistral.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_mixtral.py MLPSpeculator. (#1865) 2024-05-14 12:33:18 +02:00
flash_neox.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_phi.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_qwen2.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_rw.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_santacoder.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
flash_starcoder2.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
galactica.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
globals.py Purely refactors paged/attention into `layers/attention` and make hardware differences more obvious with 1 file per hardware. (#1986) 2024-05-31 17:57:01 +02:00
gpt_neox.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
idefics.py MLPSpeculator. (#1865) 2024-05-14 12:33:18 +02:00
idefics2.py MLPSpeculator. (#1865) 2024-05-14 12:33:18 +02:00
idefics_causal_lm.py server: use chunked inputs 2024-06-07 08:09:04 +02:00
llava_next.py MLPSpeculator. (#1865) 2024-05-14 12:33:18 +02:00
mamba.py server: use chunked inputs 2024-06-07 08:09:04 +02:00
model.py Use the generation config. (#1808) 2024-04-25 19:41:50 +02:00
mpt.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
opt.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
pali_gemma.py server: use chunked inputs 2024-06-07 08:09:04 +02:00
phi.py MLPSpeculator. (#1865) 2024-05-14 12:33:18 +02:00
rw.py fix(server): fix OPT implementation (#2061) 2024-06-12 18:22:20 +02:00
santacoder.py MLPSpeculator. (#1865) 2024-05-14 12:33:18 +02:00
seq2seq_lm.py server: use chunked inputs 2024-06-07 08:09:04 +02:00
t5.py MLPSpeculator. (#1865) 2024-05-14 12:33:18 +02:00
types.py chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
vlm_causal_lm.py Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00