History

OlivierDehaene c6e8b9442b fix(server): fix quantization for sharded models (#45 )		2023-01-31 17:40:38 +01:00
..
tests	feat: Add token streaming using ServerSideEvents support (#41 )	2023-01-31 17:04:00 +01:00
text_generation	fix(server): fix quantization for sharded models (#45 )	2023-01-31 17:40:38 +01:00
.gitignore	feat(server): Support all AutoModelForCausalLM on a best effort basis	2022-10-28 19:24:00 +02:00
Makefile	fix(dockerfile): fix docker build (#32 )	2023-01-24 19:52:39 +01:00
README.md	feat(server): Use safetensors	2022-10-22 20:00:15 +02:00
poetry.lock	fix(server): fix seeding with multiple shards (#44 )	2023-01-31 16:01:15 +01:00
pyproject.toml	fix(server): fix seeding with multiple shards (#44 )	2023-01-31 16:01:15 +01:00

BLOOM Inference Python gRPC Server

A Python gRPC server for BLOOM Inference

Install

make install

make run-dev