95 lines
2.5 KiB
Markdown
95 lines
2.5 KiB
Markdown
|
# Installing and Launching Locally
|
||
|
|
||
|
Before you start, you will need to setup your environment, install the Text Generation Inference. Text Generation Inference is tested on **Python 3.9+**.
|
||
|
|
||
|
## Local Installation for Text Generation Inference
|
||
|
|
||
|
Text Generation Inference is available on pypi, conda and GitHub.
|
||
|
|
||
|
To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
|
||
|
Python 3.9, e.g. using `conda`:
|
||
|
|
||
|
```shell
|
||
|
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
||
|
|
||
|
conda create -n text-generation-inference python=3.9
|
||
|
conda activate text-generation-inference
|
||
|
```
|
||
|
|
||
|
You may also need to install Protoc.
|
||
|
|
||
|
On Linux:
|
||
|
|
||
|
```shell
|
||
|
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
|
||
|
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
|
||
|
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
|
||
|
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
|
||
|
rm -f $PROTOC_ZIP
|
||
|
```
|
||
|
|
||
|
On MacOS, using Homebrew:
|
||
|
|
||
|
```shell
|
||
|
brew install protobuf
|
||
|
```
|
||
|
|
||
|
Then run:
|
||
|
|
||
|
```shell
|
||
|
BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels```
|
||
|
|
||
|
**Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
|
||
|
|
||
|
```shell
|
||
|
sudo apt-get install libssl-dev gcc -y
|
||
|
```
|
||
|
|
||
|
|
||
|
Once installation is done, simply run:
|
||
|
|
||
|
```shell
|
||
|
make run-falcon-7b-instruct
|
||
|
```
|
||
|
|
||
|
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
||
|
|
||
|
You can then query the model using either the `/generate` or `/generate_stream` routes:
|
||
|
|
||
|
```shell
|
||
|
curl 127.0.0.1:8080/generate \
|
||
|
-X POST \
|
||
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||
|
-H 'Content-Type: application/json'
|
||
|
```
|
||
|
|
||
|
```shell
|
||
|
curl 127.0.0.1:8080/generate_stream \
|
||
|
-X POST \
|
||
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||
|
-H 'Content-Type: application/json'
|
||
|
```
|
||
|
|
||
|
or through Python:
|
||
|
|
||
|
```shell
|
||
|
pip install text-generation
|
||
|
```
|
||
|
|
||
|
```python
|
||
|
from text_generation import Client
|
||
|
|
||
|
client = Client("http://127.0.0.1:8080")
|
||
|
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
||
|
|
||
|
text = ""
|
||
|
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
||
|
if not response.token.special:
|
||
|
text += response.token.text
|
||
|
print(text)
|
||
|
```
|
||
|
|
||
|
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
|
||
|
```
|
||
|
text-generation-launcher --help
|
||
|
```
|