hf_text-generation-inference/docs/source/basic_tutorials/local_launch.md

1.6 KiB

Installing and Launching Locally

Before you start, you will need to setup your environment, install the Text Generation Inference. Text Generation Inference is tested on Python 3.9+.

Local Installation for Text Generation Inference

Text Generation Inference is available on pypi, conda and GitHub.

To install and launch locally, first install Rust and create a Python virtual environment with at least Python 3.9, e.g. using conda:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

conda create -n text-generation-inference python=3.9
conda activate text-generation-inference

You may also need to install Protoc.

On Linux:

PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP

On MacOS, using Homebrew:

brew install protobuf

Then run:

BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels```

**Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:

```shell
sudo apt-get install libssl-dev gcc -y

Once installation is done, simply run:

make run-falcon-7b-instruct

This will serve Falcon 7B Instruct model from the port 8080, which we can query.

Note: To see all options to serve your models (in the code) or in the CLI:

text-generation-launcher --help