Have snippets in Python/JavaScript in quicktour (#809)

![Screenshot from 2023-08-10 14-20-25](https://github.com/huggingface/text-generation-inference/assets/7246357/e16d0d41-be63-4d06-8093-30540df91419) --------- Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
2023-08-14 13:47:32 +02:00 · 2023-08-14 13:47:32 +02:00 · d71237fc8b
parent 09eca64227
commit d71237fc8b
1 changed files with 61 additions and 4 deletions
--- a/docs/source/quicktour.md
+++ b/docs/source/quicktour.md
@ -17,15 +17,72 @@ To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvi
 </Tip>
-Once TGI is running, you can use the `generate` endpoint by doing requests. To learn more about how to query the endpoints, check the [Consuming TGI](./basic_tutorials/consuming_tgi) section.
+Once TGI is running, you can use the `generate` endpoint by doing requests. To learn more about how to query the endpoints, check the [Consuming TGI](./basic_tutorials/consuming_tgi) section, where we show examples with utility libraries and UIs. Below you can see a simple snippet to query the endpoint.
-```bash
+
-curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
+<inferencesnippet>
 <python>
 ```python
 import requests
 headers = {
    "Content-Type": "application/json",
 }
 data = {
    'inputs': 'What is Deep Learning?',
    'parameters': {
        'max_new_tokens': 20,
    },
 }
 response = requests.post('http://127.0.0.1:8080/generate', headers=headers, json=data)
 print(response.json())
 # {'generated_text': '\n\nDeep Learning is a subset of Machine Learning that is concerned with the development of algorithms that can'}
 ```
 </python>
 <js>
 ```js
 async function query() {
    const response = await fetch(
        'http://127.0.0.1:8080/generate', 
        {
            method: 'POST',
            headers: { 'Content-Type': 'application/json'},
            body: JSON.stringify({
                'inputs': 'What is Deep Learning?',
                'parameters': {
                    'max_new_tokens': 20
                }
            })
        }
    );
 }
 query().then((response) => {
    console.log(JSON.stringify(response));
 });
 /// {"generated_text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with the development of algorithms that can"}
 ```
 </js>
 <curl>
 ```curl
 curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
 ```
 </curl>
 </inferencesnippet>
 <Tip>
-To see all possible flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
+To see all possible deploy flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
 ```bash
 docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help