Separated querying section and emphasized self generating docs
This commit is contained in:
parent
21ca70e0eb
commit
470dcdfe7b
|
@ -11,6 +11,8 @@
|
||||||
title: Installing and Launching Locally
|
title: Installing and Launching Locally
|
||||||
- local: basic_tutorials/docker_launch
|
- local: basic_tutorials/docker_launch
|
||||||
title: Launching with Docker
|
title: Launching with Docker
|
||||||
|
- local: basic_tutorials/querying
|
||||||
|
title: Querying the Models
|
||||||
- local: basic_tutorials/consuming_TGI
|
- local: basic_tutorials/consuming_TGI
|
||||||
title: Consuming TGI as a backend
|
title: Consuming TGI as a backend
|
||||||
- local: basic_tutorials/consuming_TGI
|
- local: basic_tutorials/consuming_TGI
|
||||||
|
|
|
@ -10,43 +10,7 @@ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingf
|
||||||
```
|
```
|
||||||
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
|
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
|
||||||
|
|
||||||
|
**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
|
||||||
You can then query the model using either the `/generate` or `/generate_stream` routes:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
curl 127.0.0.1:8080/generate \
|
|
||||||
-X POST \
|
|
||||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
```shell
|
|
||||||
curl 127.0.0.1:8080/generate_stream \
|
|
||||||
-X POST \
|
|
||||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
or from Python:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
pip install text-generation
|
|
||||||
```
|
|
||||||
|
|
||||||
```python
|
|
||||||
from text_generation import Client
|
|
||||||
|
|
||||||
client = Client("http://127.0.0.1:8080")
|
|
||||||
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
|
||||||
|
|
||||||
text = ""
|
|
||||||
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
|
||||||
if not response.token.special:
|
|
||||||
text += response.token.text
|
|
||||||
print(text)
|
|
||||||
```
|
|
||||||
|
|
||||||
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
|
|
||||||
```
|
```
|
||||||
text-generation-launcher --help
|
text-generation-launcher --help
|
||||||
```
|
```
|
|
@ -54,44 +54,7 @@ make run-falcon-7b-instruct
|
||||||
|
|
||||||
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
||||||
|
|
||||||
You can then query the model using either the `/generate` or `/generate_stream` routes:
|
**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the CLI:
|
||||||
|
|
||||||
```shell
|
|
||||||
curl 127.0.0.1:8080/generate \
|
|
||||||
-X POST \
|
|
||||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
```shell
|
|
||||||
curl 127.0.0.1:8080/generate_stream \
|
|
||||||
-X POST \
|
|
||||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
or through Python:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
pip install text-generation
|
|
||||||
```
|
|
||||||
|
|
||||||
Then run:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from text_generation import Client
|
|
||||||
|
|
||||||
client = Client("http://127.0.0.1:8080")
|
|
||||||
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
|
||||||
|
|
||||||
text = ""
|
|
||||||
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
|
||||||
if not response.token.special:
|
|
||||||
text += response.token.text
|
|
||||||
print(text)
|
|
||||||
```
|
|
||||||
|
|
||||||
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
|
|
||||||
```
|
```
|
||||||
text-generation-launcher --help
|
text-generation-launcher --help
|
||||||
```
|
```
|
||||||
|
|
|
@ -0,0 +1,41 @@
|
||||||
|
# Querying the Models
|
||||||
|
|
||||||
|
After the launch, query the model using either the `/generate` or `/generate_stream` routes:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
curl 127.0.0.1:8080/generate \
|
||||||
|
-X POST \
|
||||||
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||||||
|
-H 'Content-Type: application/json'
|
||||||
|
```
|
||||||
|
|
||||||
|
```shell
|
||||||
|
curl 127.0.0.1:8080/generate_stream \
|
||||||
|
-X POST \
|
||||||
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||||||
|
-H 'Content-Type: application/json'
|
||||||
|
```
|
||||||
|
|
||||||
|
or through Python:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
pip install text-generation
|
||||||
|
```
|
||||||
|
|
||||||
|
Then run:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from text_generation import Client
|
||||||
|
|
||||||
|
client = Client("http://127.0.0.1:8080")
|
||||||
|
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
||||||
|
|
||||||
|
text = ""
|
||||||
|
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
||||||
|
if not response.token.special:
|
||||||
|
text += response.token.text
|
||||||
|
print(text)
|
||||||
|
```
|
||||||
|
|
||||||
|
## API documentation
|
||||||
|
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).
|
Loading…
Reference in New Issue