diff --git a/docs/source/basic_tutorials/consuming_tgi.md b/docs/source/basic_tutorials/consuming_tgi.md index 1f0ff37d..540f4b13 100644 --- a/docs/source/basic_tutorials/consuming_tgi.md +++ b/docs/source/basic_tutorials/consuming_tgi.md @@ -75,6 +75,81 @@ To serve both ChatUI and TGI in same environment, simply add your own endpoints ![ChatUI](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chatui_screen.png) +## Gradio + +Gradio is a Python library that helps you build web applications for your machine learning models with a few lines of code. It has a `ChatInterface` wrapper that helps create neat UIs for chatbots. Let's take a look at how to create a chatbot with streaming mode using TGI and Gradio. Let's install Gradio and Hub Python library first. + +```bash +pip install huggingface-hub gradio +``` + +Assume you are serving your model on port 8080, we will query through [InferenceClient](consuming_tgi#inference-client). + +```python +import gradio as gr +from huggingface_hub import InferenceClient + +client = InferenceClient(model="http://127.0.0.1:8080") + +def inference(message, history): + partial_message = "" + for token in client.text_generation(message, max_new_tokens=20, stream=True): + partial_message += token + yield partial_message + +gr.ChatInterface( + inference, + chatbot=gr.Chatbot(height=300), + textbox=gr.Textbox(placeholder="Chat with me!", container=False, scale=7), + description="This is the demo for Gradio UI consuming TGI endpoint with LLaMA 7B-Chat model.", + title="Gradio 🤝 TGI", + examples=["Are tomatoes vegetables?"], + retry_btn="Retry", + undo_btn="Undo", + clear_btn="Clear", +).queue().launch() +``` + +The UI looks like this 👇 + +