hf_text-generation-inference/docs/source/basic_tutorials/local_launch.md

# Installing and Launching Locally

Before you start, you will need to setup your environment, install the Text Generation Inference. Text Generation Inference is tested on **Python 3.9+**.

## Local Installation for Text Generation Inference

Text Generation Inference is available on pypi, conda and GitHub. 

To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
Python 3.9, e.g. using `conda`:

```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

conda create -n text-generation-inference python=3.9
conda activate text-generation-inference
```

You may also need to install Protoc.

On Linux:

```shell
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
```

On MacOS, using Homebrew:

```shell
brew install protobuf
```

Then run:

```shell
BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels```

**Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:

```shell
sudo apt-get install libssl-dev gcc -y
```


Once installation is done, simply run:

```shell
make run-falcon-7b-instruct
```

This will serve Falcon 7B Instruct model from the port 8080, which we can query.

You can then query the model using either the `/generate` or `/generate_stream` routes:

```shell
curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
```

```shell
curl 127.0.0.1:8080/generate_stream \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
```

or through Python:

```shell
pip install text-generation
```

```python
from text_generation import Client

client = Client("http://127.0.0.1:8080")
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)

text = ""
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
    if not response.token.special:
        text += response.token.text
print(text)
```

To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
```
text-generation-launcher --help
```
Added installation and launch notes and re-structured toc 2023-07-31 08:35:36 -06:00			`# Installing and Launching Locally`

			`Before you start, you will need to setup your environment, install the Text Generation Inference. Text Generation Inference is tested on Python 3.9+.`

			`## Local Installation for Text Generation Inference`

			`Text Generation Inference is available on pypi, conda and GitHub.`

			`To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least`
			Python 3.9, e.g. using `conda`:

			```shell
			`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \| sh`

			`conda create -n text-generation-inference python=3.9`
			`conda activate text-generation-inference`
			```

			`You may also need to install Protoc.`

			`On Linux:`

			```shell
			`PROTOC_ZIP=protoc-21.12-linux-x86_64.zip`
			`curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP`
			`sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc`
			`sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'`
			`rm -f $PROTOC_ZIP`
			```

			`On MacOS, using Homebrew:`

			```shell
			`brew install protobuf`
			```

			`Then run:`

			```shell
			BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels```

			`Note: on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:`

			```shell
			`sudo apt-get install libssl-dev gcc -y`
			```


			`Once installation is done, simply run:`

			```shell
			`make run-falcon-7b-instruct`
			```

			`This will serve Falcon 7B Instruct model from the port 8080, which we can query.`

			You can then query the model using either the `/generate` or `/generate_stream` routes:

			```shell
			`curl 127.0.0.1:8080/generate \`
			`-X POST \`
			`-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \`
			`-H 'Content-Type: application/json'`
			```

			```shell
			`curl 127.0.0.1:8080/generate_stream \`
			`-X POST \`
			`-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \`
			`-H 'Content-Type: application/json'`
			```

			`or through Python:`

			```shell
			`pip install text-generation`
			```

			```python
			`from text_generation import Client`

			`client = Client("http://127.0.0.1:8080")`
			`print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)`

			`text = ""`
			`for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):`
			`if not response.token.special:`
			`text += response.token.text`
			`print(text)`
			```

			`To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:`
			```
			`text-generation-launcher --help`
			```