# Querying the Models After the launch, query the model using either the `/generate` or `/generate_stream` routes: ```shell curl 127.0.0.1:8080/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' ``` ```shell curl 127.0.0.1:8080/generate_stream \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' ``` or through Python: ```shell pip install text-generation ``` Then run: ```python from text_generation import Client client = Client("http://127.0.0.1:8080") print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text) text = "" for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20): if not response.token.special: text += response.token.text print(text) ``` ## API documentation You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).