41 lines
1.1 KiB
Markdown
41 lines
1.1 KiB
Markdown
|
# Querying the Models
|
||
|
|
||
|
After the launch, query the model using either the `/generate` or `/generate_stream` routes:
|
||
|
|
||
|
```shell
|
||
|
curl 127.0.0.1:8080/generate \
|
||
|
-X POST \
|
||
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||
|
-H 'Content-Type: application/json'
|
||
|
```
|
||
|
|
||
|
```shell
|
||
|
curl 127.0.0.1:8080/generate_stream \
|
||
|
-X POST \
|
||
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||
|
-H 'Content-Type: application/json'
|
||
|
```
|
||
|
|
||
|
or through Python:
|
||
|
|
||
|
```shell
|
||
|
pip install text-generation
|
||
|
```
|
||
|
|
||
|
Then run:
|
||
|
|
||
|
```python
|
||
|
from text_generation import Client
|
||
|
|
||
|
client = Client("http://127.0.0.1:8080")
|
||
|
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
||
|
|
||
|
text = ""
|
||
|
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
||
|
if not response.token.special:
|
||
|
text += response.token.text
|
||
|
print(text)
|
||
|
```
|
||
|
|
||
|
## API documentation
|
||
|
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).
|