# Installing and Launching Locally Before you start, you will need to setup your environment, install the Text Generation Inference. Text Generation Inference is tested on **Python 3.9+**. ## Local Installation for Text Generation Inference Text Generation Inference is available on pypi, conda and GitHub. To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least Python 3.9, e.g. using `conda`: ```shell curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh conda create -n text-generation-inference python=3.9 conda activate text-generation-inference ``` You may also need to install Protoc. On Linux: ```shell PROTOC_ZIP=protoc-21.12-linux-x86_64.zip curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*' rm -f $PROTOC_ZIP ``` On MacOS, using Homebrew: ```shell brew install protobuf ``` Then run: ```shell BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels``` **Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run: ```shell sudo apt-get install libssl-dev gcc -y ``` Once installation is done, simply run: ```shell make run-falcon-7b-instruct ``` This will serve Falcon 7B Instruct model from the port 8080, which we can query. You can then query the model using either the `/generate` or `/generate_stream` routes: ```shell curl 127.0.0.1:8080/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' ``` ```shell curl 127.0.0.1:8080/generate_stream \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' ``` or through Python: ```shell pip install text-generation ``` Then run: ```python from text_generation import Client client = Client("http://127.0.0.1:8080") print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text) text = "" for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20): if not response.token.special: text += response.token.text print(text) ``` To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli: ``` text-generation-launcher --help ```