parent
7dbaef3f5b
commit
e58ad6dd66
|
@ -15,4 +15,6 @@
|
||||||
title: Preparing Model for Serving
|
title: Preparing Model for Serving
|
||||||
- local: basic_tutorials/gated_model_access
|
- local: basic_tutorials/gated_model_access
|
||||||
title: Serving Private & Gated Models
|
title: Serving Private & Gated Models
|
||||||
|
- local: basic_tutorials/using_cli
|
||||||
|
title: Using TGI CLI
|
||||||
title: Tutorials
|
title: Tutorials
|
||||||
|
|
|
@ -0,0 +1,35 @@
|
||||||
|
# Using TGI CLI
|
||||||
|
|
||||||
|
You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters. To install the CLI, please refer to [the installation section](./installation#install-cli).
|
||||||
|
|
||||||
|
`text-generation-server` lets you download the model with `download-weights` command like below 👇
|
||||||
|
|
||||||
|
```bash
|
||||||
|
text-generation-server download-weights MODEL_HUB_ID
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also use it to quantize models like below 👇
|
||||||
|
|
||||||
|
```bash
|
||||||
|
text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR
|
||||||
|
```
|
||||||
|
|
||||||
|
You can use `text-generation-launcher` to serve models.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
text-generation-launcher --model-id MODEL_HUB_ID --port 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
text-generation-launcher --help
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/).
|
||||||
|
|
||||||
|
Same documentation can be found for `text-generation-server`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
text-generation-server --help
|
||||||
|
```
|
|
@ -4,8 +4,20 @@ This section explains how to install the CLI tool as well as installing TGI from
|
||||||
|
|
||||||
## Install CLI
|
## Install CLI
|
||||||
|
|
||||||
TODO
|
You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters.
|
||||||
|
|
||||||
|
To install the CLI, you need to first clone the TGI repository and then run `make`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
|
||||||
|
make install
|
||||||
|
```
|
||||||
|
|
||||||
|
If you would like to serve models with custom kernels, run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
BUILD_EXTENSIONS=True make install
|
||||||
|
```
|
||||||
|
|
||||||
## Local Installation from Source
|
## Local Installation from Source
|
||||||
|
|
||||||
|
@ -44,7 +56,8 @@ brew install protobuf
|
||||||
Then run to install Text Generation Inference:
|
Then run to install Text Generation Inference:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels
|
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
|
||||||
|
BUILD_EXTENSIONS=True make install
|
||||||
```
|
```
|
||||||
|
|
||||||
<Tip warning={true}>
|
<Tip warning={true}>
|
||||||
|
@ -64,9 +77,3 @@ make run-falcon-7b-instruct
|
||||||
```
|
```
|
||||||
|
|
||||||
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
||||||
|
|
||||||
To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
text-generation-launcher --help
|
|
||||||
```
|
|
||||||
|
|
|
@ -4,7 +4,7 @@ The easiest way of getting started is using the official Docker container. Insta
|
||||||
|
|
||||||
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
|
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
model=tiiuae/falcon-7b-instruct
|
model=tiiuae/falcon-7b-instruct
|
||||||
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue