Added gradio example to docs (#867)

cc @osanseviero --------- Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
2023-08-24 00:50:12 +03:00 · 2023-08-24 00:50:12 +03:00 · 97444f9367
parent 888c029114
commit 97444f9367
1 changed files with 75 additions and 0 deletions
--- a/docs/source/basic_tutorials/consuming_tgi.md
+++ b/docs/source/basic_tutorials/consuming_tgi.md
@ -75,6 +75,81 @@ To serve both ChatUI and TGI in same environment, simply add your own endpoints

 ![ChatUI](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chatui_screen.png)

+## Gradio
+
+Gradio is a Python library that helps you build web applications for your machine learning models with a few lines of code. It has a `ChatInterface` wrapper that helps create neat UIs for chatbots. Let's take a look at how to create a chatbot with streaming mode using TGI and Gradio. Let's install Gradio and Hub Python library first.
+
+```bash
+pip install huggingface-hub gradio
+```
+
+Assume you are serving your model on port 8080, we will query through [InferenceClient](consuming_tgi#inference-client). 
+ 
+```python
+import gradio as gr
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(model="http://127.0.0.1:8080")
+
+def inference(message, history):
+    partial_message = ""
+    for token in client.text_generation(message, max_new_tokens=20, stream=True):
+        partial_message += token
+        yield partial_message
+
+gr.ChatInterface(
+    inference,
+    chatbot=gr.Chatbot(height=300),
+    textbox=gr.Textbox(placeholder="Chat with me!", container=False, scale=7),
+    description="This is the demo for Gradio UI consuming TGI endpoint with LLaMA 7B-Chat model.",
+    title="Gradio 🤝 TGI",
+    examples=["Are tomatoes vegetables?"],
+    retry_btn="Retry",
+    undo_btn="Undo",
+    clear_btn="Clear",
+).queue().launch()
+```
+
+The UI looks like this 👇 
+
+<div class="flex justify-center">
+    <img 
+        class="block dark:hidden" 
+        src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/gradio-tgi.png"
+    />
+    <img 
+        class="hidden dark:block" 
+        src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/gradio-tgi-dark.png"
+    />
+</div>
+
+You can try the demo directly here 👇 
+
+<div class="block dark:hidden">
+	<iframe 
+        src="https://merve-gradio-tgi-2.hf.space?__theme=light"
+        width="850"
+        height="750"
+    ></iframe>
+</div>
+<div class="hidden dark:block">
+    <iframe 
+        src="https://merve-gradio-tgi-2.hf.space?__theme=dark"
+        width="850"
+        height="750"
+    ></iframe>
+</div>
+
+
+You can disable streaming mode using `return` instead of `yield` in your inference function, like below.
+
+```python
+def inference(message, history):
+    return client.text_generation(message, max_new_tokens=20)
+```
+
+You can read more about how to customize a `ChatInterface` [here](https://www.gradio.app/guides/creating-a-chatbot-fast).
+
 ## API documentation

 You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).