diff --git a/docs/source/basic_tutorials/consuming_tgi.md b/docs/source/basic_tutorials/consuming_tgi.md index 1f0ff37d..540f4b13 100644 --- a/docs/source/basic_tutorials/consuming_tgi.md +++ b/docs/source/basic_tutorials/consuming_tgi.md @@ -75,6 +75,81 @@ To serve both ChatUI and TGI in same environment, simply add your own endpoints ![ChatUI](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chatui_screen.png) +## Gradio + +Gradio is a Python library that helps you build web applications for your machine learning models with a few lines of code. It has a `ChatInterface` wrapper that helps create neat UIs for chatbots. Let's take a look at how to create a chatbot with streaming mode using TGI and Gradio. Let's install Gradio and Hub Python library first. + +```bash +pip install huggingface-hub gradio +``` + +Assume you are serving your model on port 8080, we will query through [InferenceClient](consuming_tgi#inference-client). + +```python +import gradio as gr +from huggingface_hub import InferenceClient + +client = InferenceClient(model="http://127.0.0.1:8080") + +def inference(message, history): + partial_message = "" + for token in client.text_generation(message, max_new_tokens=20, stream=True): + partial_message += token + yield partial_message + +gr.ChatInterface( + inference, + chatbot=gr.Chatbot(height=300), + textbox=gr.Textbox(placeholder="Chat with me!", container=False, scale=7), + description="This is the demo for Gradio UI consuming TGI endpoint with LLaMA 7B-Chat model.", + title="Gradio 🤝 TGI", + examples=["Are tomatoes vegetables?"], + retry_btn="Retry", + undo_btn="Undo", + clear_btn="Clear", +).queue().launch() +``` + +The UI looks like this 👇 + +
+ + +
+ +You can try the demo directly here 👇 + +
+ +
+ + + +You can disable streaming mode using `return` instead of `yield` in your inference function, like below. + +```python +def inference(message, history): + return client.text_generation(message, max_new_tokens=20) +``` + +You can read more about how to customize a `ChatInterface` [here](https://www.gradio.app/guides/creating-a-chatbot-fast). + ## API documentation You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).