Separated querying section and emphasized self generating docs

2023-08-01 14:10:45 +03:00 · 2023-08-01 14:10:45 +03:00 · 470dcdfe7b
parent 21ca70e0eb
commit 470dcdfe7b
4 changed files with 45 additions and 75 deletions
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@ -11,6 +11,8 @@
    title: Installing and Launching Locally
  - local: basic_tutorials/docker_launch
    title: Launching with Docker
+  - local: basic_tutorials/querying
+    title: Querying the Models
  - local: basic_tutorials/consuming_TGI
    title: Consuming TGI as a backend
  - local: basic_tutorials/consuming_TGI
--- a/docs/source/basic_tutorials/docker_launch.md
+++ b/docs/source/basic_tutorials/docker_launch.md
@ -10,43 +10,7 @@ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingf
 ```
 **Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.

-
-You can then query the model using either the `/generate` or `/generate_stream` routes:
-
-```shell
-curl 127.0.0.1:8080/generate \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-    -H 'Content-Type: application/json'
-```
-
-```shell
-curl 127.0.0.1:8080/generate_stream \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-    -H 'Content-Type: application/json'
-```
-
-or from Python:
-
-```shell
-pip install text-generation
-```
-
-```python
-from text_generation import Client
-
-client = Client("http://127.0.0.1:8080")
-print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
-
-text = ""
-for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
-    if not response.token.special:
-        text += response.token.text
-print(text)
-```
-
-To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
+**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
 ```
 text-generation-launcher --help
 ```
--- a/docs/source/basic_tutorials/local_launch.md
+++ b/docs/source/basic_tutorials/local_launch.md
@ -54,44 +54,7 @@ make run-falcon-7b-instruct

 This will serve Falcon 7B Instruct model from the port 8080, which we can query.

-You can then query the model using either the `/generate` or `/generate_stream` routes:
-
-```shell
-curl 127.0.0.1:8080/generate \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-    -H 'Content-Type: application/json'
-```
-
-```shell
-curl 127.0.0.1:8080/generate_stream \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-    -H 'Content-Type: application/json'
-```
-
-or through Python:
-
-```shell
-pip install text-generation
-```
-
-Then run:
-
-```python
-from text_generation import Client
-
-client = Client("http://127.0.0.1:8080")
-print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
-
-text = ""
-for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
-    if not response.token.special:
-        text += response.token.text
-print(text)
-```
-
-To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
+**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the CLI:
 ```
 text-generation-launcher --help
 ```
--- a/docs/source/basic_tutorials/querying.md
+++ b/docs/source/basic_tutorials/querying.md
@ -0,0 +1,41 @@
+# Querying the Models
+
+After the launch, query the model using either the `/generate` or `/generate_stream` routes:
+
+```shell
+curl 127.0.0.1:8080/generate \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
+    -H 'Content-Type: application/json'
+```
+
+```shell
+curl 127.0.0.1:8080/generate_stream \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
+    -H 'Content-Type: application/json'
+```
+
+or through Python:
+
+```shell
+pip install text-generation
+```
+
+Then run:
+
+```python
+from text_generation import Client
+
+client = Client("http://127.0.0.1:8080")
+print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
+
+text = ""
+for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
+    if not response.token.special:
+        text += response.token.text
+print(text)
+```
+
+## API documentation
+You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).