hf_text-generation-inference/docs/source/basic_tutorials/querying.md

# Querying the Models

After the launch, query the model using either the `/generate` or `/generate_stream` routes:

```shell
curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
```

```shell
curl 127.0.0.1:8080/generate_stream \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
```

or through Python:

```shell
pip install text-generation
```

Then run:

```python
from text_generation import Client

client = Client("http://127.0.0.1:8080")
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)

text = ""
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
    if not response.token.special:
        text += response.token.text
print(text)
```

## API documentation
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).
Separated querying section and emphasized self generating docs 2023-08-01 05:10:45 -06:00			`# Querying the Models`

			After the launch, query the model using either the `/generate` or `/generate_stream` routes:

			```shell
			`curl 127.0.0.1:8080/generate \`
			`-X POST \`
			`-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \`
			`-H 'Content-Type: application/json'`
			```

			```shell
			`curl 127.0.0.1:8080/generate_stream \`
			`-X POST \`
			`-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \`
			`-H 'Content-Type: application/json'`
			```

			`or through Python:`

			```shell
			`pip install text-generation`
			```

			`Then run:`

			```python
			`from text_generation import Client`

			`client = Client("http://127.0.0.1:8080")`
			`print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)`

			`text = ""`
			`for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):`
			`if not response.token.special:`
			`text += response.token.text`
			`print(text)`
			```

			`## API documentation`
			You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).