37 lines
729 B
Markdown
37 lines
729 B
Markdown
|
# BLOOM Inference
|
||
|
|
||
|
A Rust and gRPC server for BLOOM Inference.
|
||
|
|
||
|
## Install
|
||
|
|
||
|
```shell
|
||
|
cd server
|
||
|
pip install .
|
||
|
```
|
||
|
|
||
|
```
|
||
|
cd router
|
||
|
cargo build --release
|
||
|
```
|
||
|
|
||
|
## Run
|
||
|
|
||
|
```shell
|
||
|
python server/bloom_inference/main.py bigscience/bloom --num-gpus 8 --shard-directory /dev/shm/models
|
||
|
```
|
||
|
|
||
|
```shell
|
||
|
./router/target/release/router
|
||
|
```
|
||
|
|
||
|
## TODO:
|
||
|
|
||
|
- [ ] Improve model download
|
||
|
- Store "shardable" layers separately and layer by layer
|
||
|
- [ ] Add batching args to router CLI
|
||
|
- [ ] Add docstrings + comments everywhere as the codebase is fairly complicated
|
||
|
- [ ] Add tests
|
||
|
- [ ] Add shutdown logic in router and server
|
||
|
- [ ] Improve multi-processing logic in server
|
||
|
- [ ] Improve error handling everywhere
|
||
|
- [ ] Improve past key layer indexing?
|