Separated querying section and emphasized self generating docs

This commit is contained in:
Merve Noyan 2023-08-01 14:10:45 +03:00 committed by GitHub
parent 21ca70e0eb
commit 470dcdfe7b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 45 additions and 75 deletions

View File

@ -11,6 +11,8 @@
title: Installing and Launching Locally title: Installing and Launching Locally
- local: basic_tutorials/docker_launch - local: basic_tutorials/docker_launch
title: Launching with Docker title: Launching with Docker
- local: basic_tutorials/querying
title: Querying the Models
- local: basic_tutorials/consuming_TGI - local: basic_tutorials/consuming_TGI
title: Consuming TGI as a backend title: Consuming TGI as a backend
- local: basic_tutorials/consuming_TGI - local: basic_tutorials/consuming_TGI

View File

@ -10,43 +10,7 @@ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingf
``` ```
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher. **Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
You can then query the model using either the `/generate` or `/generate_stream` routes:
```shell
curl 127.0.0.1:8080/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
```
```shell
curl 127.0.0.1:8080/generate_stream \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
```
or from Python:
```shell
pip install text-generation
```
```python
from text_generation import Client
client = Client("http://127.0.0.1:8080")
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
text = ""
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
if not response.token.special:
text += response.token.text
print(text)
```
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
``` ```
text-generation-launcher --help text-generation-launcher --help
``` ```

View File

@ -54,44 +54,7 @@ make run-falcon-7b-instruct
This will serve Falcon 7B Instruct model from the port 8080, which we can query. This will serve Falcon 7B Instruct model from the port 8080, which we can query.
You can then query the model using either the `/generate` or `/generate_stream` routes: **Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the CLI:
```shell
curl 127.0.0.1:8080/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
```
```shell
curl 127.0.0.1:8080/generate_stream \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
```
or through Python:
```shell
pip install text-generation
```
Then run:
```python
from text_generation import Client
client = Client("http://127.0.0.1:8080")
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
text = ""
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
if not response.token.special:
text += response.token.text
print(text)
```
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
``` ```
text-generation-launcher --help text-generation-launcher --help
``` ```

View File

@ -0,0 +1,41 @@
# Querying the Models
After the launch, query the model using either the `/generate` or `/generate_stream` routes:
```shell
curl 127.0.0.1:8080/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
```
```shell
curl 127.0.0.1:8080/generate_stream \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
```
or through Python:
```shell
pip install text-generation
```
Then run:
```python
from text_generation import Client
client = Client("http://127.0.0.1:8080")
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
text = ""
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
if not response.token.special:
text += response.token.text
print(text)
```
## API documentation
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).