preemo_text-generation-infe.../server
Nick Hill 31d76e238d
fix(batching): Avoid theoretical hang in batcher loop (#5)
- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute

Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>
2022-12-05 10:10:59 +01:00
..
text_generation fix(batching): Avoid theoretical hang in batcher loop (#5) 2022-12-05 10:10:59 +01:00
.gitignore feat(server): Support all AutoModelForCausalLM on a best effort basis 2022-10-28 19:24:00 +02:00
Makefile feat(server): Support Galactica (#4) 2022-12-01 19:31:54 +01:00
README.md feat(server): Use safetensors 2022-10-22 20:00:15 +02:00
poetry.lock feat(server): Improved doc 2022-11-07 12:53:56 +01:00
pyproject.toml feat(server): Improved doc 2022-11-07 12:53:56 +01:00

README.md

BLOOM Inference Python gRPC Server

A Python gRPC server for BLOOM Inference

Install

make install

Run

make run-dev