e903770897
When a batch contained images if different sizes during prefill, the server would fail (see e.g. #2056). Images were processed separately and then concatenated. However, this can fail for images with different sizes. Fix this by preprocessing all images in the batch together, so that the image processor can ensure that all image tensors have compatible sizes. |
||
---|---|---|
.. | ||
custom_kernels | ||
exllama_kernels | ||
exllamav2_kernels | ||
marlin | ||
tests | ||
text_generation_server | ||
.gitignore | ||
Makefile | ||
Makefile-awq | ||
Makefile-eetq | ||
Makefile-flash-att | ||
Makefile-flash-att-v2 | ||
Makefile-selective-scan | ||
Makefile-vllm | ||
README.md | ||
poetry.lock | ||
pyproject.toml | ||
requirements_cuda.txt | ||
requirements_intel.txt | ||
requirements_rocm.txt |
README.md
Text Generation Inference Python gRPC Server
A Python gRPC server for Text Generation Inference
Install
make install
Run
make run-dev