Separated querying section and emphasized self generating docs
This commit is contained in:
parent
21ca70e0eb
commit
470dcdfe7b
|
@ -11,6 +11,8 @@
|
|||
title: Installing and Launching Locally
|
||||
- local: basic_tutorials/docker_launch
|
||||
title: Launching with Docker
|
||||
- local: basic_tutorials/querying
|
||||
title: Querying the Models
|
||||
- local: basic_tutorials/consuming_TGI
|
||||
title: Consuming TGI as a backend
|
||||
- local: basic_tutorials/consuming_TGI
|
||||
|
|
|
@ -10,43 +10,7 @@ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingf
|
|||
```
|
||||
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
|
||||
|
||||
|
||||
You can then query the model using either the `/generate` or `/generate_stream` routes:
|
||||
|
||||
```shell
|
||||
curl 127.0.0.1:8080/generate \
|
||||
-X POST \
|
||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
```shell
|
||||
curl 127.0.0.1:8080/generate_stream \
|
||||
-X POST \
|
||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
or from Python:
|
||||
|
||||
```shell
|
||||
pip install text-generation
|
||||
```
|
||||
|
||||
```python
|
||||
from text_generation import Client
|
||||
|
||||
client = Client("http://127.0.0.1:8080")
|
||||
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
||||
|
||||
text = ""
|
||||
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
||||
if not response.token.special:
|
||||
text += response.token.text
|
||||
print(text)
|
||||
```
|
||||
|
||||
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
|
||||
**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
|
||||
```
|
||||
text-generation-launcher --help
|
||||
```
|
|
@ -54,44 +54,7 @@ make run-falcon-7b-instruct
|
|||
|
||||
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
||||
|
||||
You can then query the model using either the `/generate` or `/generate_stream` routes:
|
||||
|
||||
```shell
|
||||
curl 127.0.0.1:8080/generate \
|
||||
-X POST \
|
||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
```shell
|
||||
curl 127.0.0.1:8080/generate_stream \
|
||||
-X POST \
|
||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
or through Python:
|
||||
|
||||
```shell
|
||||
pip install text-generation
|
||||
```
|
||||
|
||||
Then run:
|
||||
|
||||
```python
|
||||
from text_generation import Client
|
||||
|
||||
client = Client("http://127.0.0.1:8080")
|
||||
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
||||
|
||||
text = ""
|
||||
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
||||
if not response.token.special:
|
||||
text += response.token.text
|
||||
print(text)
|
||||
```
|
||||
|
||||
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
|
||||
**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the CLI:
|
||||
```
|
||||
text-generation-launcher --help
|
||||
```
|
||||
|
|
|
@ -0,0 +1,41 @@
|
|||
# Querying the Models
|
||||
|
||||
After the launch, query the model using either the `/generate` or `/generate_stream` routes:
|
||||
|
||||
```shell
|
||||
curl 127.0.0.1:8080/generate \
|
||||
-X POST \
|
||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
```shell
|
||||
curl 127.0.0.1:8080/generate_stream \
|
||||
-X POST \
|
||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
or through Python:
|
||||
|
||||
```shell
|
||||
pip install text-generation
|
||||
```
|
||||
|
||||
Then run:
|
||||
|
||||
```python
|
||||
from text_generation import Client
|
||||
|
||||
client = Client("http://127.0.0.1:8080")
|
||||
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
||||
|
||||
text = ""
|
||||
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
||||
if not response.token.special:
|
||||
text += response.token.text
|
||||
print(text)
|
||||
```
|
||||
|
||||
## API documentation
|
||||
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).
|
Loading…
Reference in New Issue