Separated querying section and emphasized self generating docs

2023-08-01 14:10:45 +03:00 · 2023-08-01 14:10:45 +03:00 · 470dcdfe7b
parent 21ca70e0eb
commit 470dcdfe7b
4 changed files with 45 additions and 75 deletions
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@ -11,6 +11,8 @@
    title: Installing and Launching Locally
  - local: basic_tutorials/docker_launch
    title: Launching with Docker
  - local: basic_tutorials/querying
    title: Querying the Models
  - local: basic_tutorials/consuming_TGI
    title: Consuming TGI as a backend
  - local: basic_tutorials/consuming_TGI
--- a/docs/source/basic_tutorials/docker_launch.md
+++ b/docs/source/basic_tutorials/docker_launch.md
@ -10,43 +10,7 @@ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingf
 ```
 **Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
-
+**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
 You can then query the model using either the `/generate` or `/generate_stream` routes:
 ```shell
 curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
 ```
 ```shell
 curl 127.0.0.1:8080/generate_stream \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
 ```
 or from Python:
 ```shell
 pip install text-generation
 ```
 ```python
 from text_generation import Client
 client = Client("http://127.0.0.1:8080")
 print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
 text = ""
 for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
    if not response.token.special:
        text += response.token.text
 print(text)
 ```
 To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
 ```
 text-generation-launcher --help
 ```
--- a/docs/source/basic_tutorials/local_launch.md
+++ b/docs/source/basic_tutorials/local_launch.md
@ -54,44 +54,7 @@ make run-falcon-7b-instruct
 This will serve Falcon 7B Instruct model from the port 8080, which we can query.
-You can then query the model using either the `/generate` or `/generate_stream` routes:
+**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the CLI:
 ```shell
 curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
 ```
 ```shell
 curl 127.0.0.1:8080/generate_stream \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
 ```
 or through Python:
 ```shell
 pip install text-generation
 ```
 Then run:
 ```python
 from text_generation import Client
 client = Client("http://127.0.0.1:8080")
 print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
 text = ""
 for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
    if not response.token.special:
        text += response.token.text
 print(text)
 ```
 To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
 ```
 text-generation-launcher --help
 ```
--- a/docs/source/basic_tutorials/querying.md
+++ b/docs/source/basic_tutorials/querying.md
@ -0,0 +1,41 @@
 # Querying the Models
 After the launch, query the model using either the `/generate` or `/generate_stream` routes:
 ```shell
 curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
 ```
 ```shell
 curl 127.0.0.1:8080/generate_stream \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
 ```
 or through Python:
 ```shell
 pip install text-generation
 ```
 Then run:
 ```python
 from text_generation import Client
 client = Client("http://127.0.0.1:8080")
 print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
 text = ""
 for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
    if not response.token.special:
        text += response.token.text
 print(text)
 ```
 ## API documentation
 You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).