Add TGI monitoring guide through Grafana and Prometheus (#1908)

As per title. It is very useful.
This commit is contained in:
fxmarty 2024-05-17 16:34:44 +02:00 committed by GitHub
parent 232e8d5227
commit c4cf8b49d1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 4051 additions and 1 deletions

3973
assets/tgi_grafana.json Normal file

File diff suppressed because it is too large Load Diff

View File

@ -28,7 +28,7 @@
- local: basic_tutorials/using_cli - local: basic_tutorials/using_cli
title: Using TGI CLI title: Using TGI CLI
- local: basic_tutorials/launcher - local: basic_tutorials/launcher
title: All TGI CLI options title: All TGI CLI options
- local: basic_tutorials/non_core_models - local: basic_tutorials/non_core_models
title: Non-core Model Serving title: Non-core Model Serving
- local: basic_tutorials/safety - local: basic_tutorials/safety
@ -37,6 +37,8 @@
title: Using Guidance, JSON, tools title: Using Guidance, JSON, tools
- local: basic_tutorials/visual_language_models - local: basic_tutorials/visual_language_models
title: Visual Language Models title: Visual Language Models
- local: basic_tutorials/monitoring
title: Monitoring TGI with Prometheus and Grafana
title: Tutorials title: Tutorials
- sections: - sections:
- local: conceptual/streaming - local: conceptual/streaming

View File

@ -0,0 +1,75 @@
# Monitoring TGI server with Prometheus and Grafana dashboard
TGI server deployment can easily be monitored through a Grafana dashboard, consuming a Prometheus data collection. Example of inspectable metrics are statistics on the effective batch sizes used by TGI, prefill/decode latencies, number of generated tokens, etc.
In this tutorial, we look at how to set up a local Grafana dashboard to monitor TGI usage.
![Grafana dashboard for TGI](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/grafana.png)
## Setup on the server machine
First, on your server machine, TGI needs to be launched as usual. TGI exposes [multiple](https://github.com/huggingface/text-generation-inference/discussions/1127#discussioncomment-7240527) metrics that can be collected by Prometheus monitoring server.
In the rest of this tutorial, we assume that TGI was launched through Docker with `--network host`.
On the server where TGI is hosted, a Prometheus server needs to be installed and launched. To do so, please follow [Prometheus installation instructions](https://prometheus.io/download/#prometheus). For example, at the time of writing on a Linux machine:
```
wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
cd prometheus
```
Prometheus needs to be configured to listen on TGI's port. To do so, in Prometheus configuration file `prometheus.yml`, one needs to edit the lines:
```
static_configs:
- targets: ["0.0.0.0:80"]
```
to use the correct IP address and port.
We suggest to try `curl 0.0.0.0:80/generate -X POST -d '{"inputs":"hey chatbot, how are","parameters":{"max_new_tokens":15}}' -H 'Content-Type: application/json'` on the server side to make sure to configure the correct IP and port.
Once Prometheus is configured, Prometheus server can be launched on the same machine where TGI is launched:
```
./prometheus --config.file="prometheus.yml"
```
In this guide, Prometheus monitoring data will be consumed on a local computer. Hence, we need to forward Prometheus port (by default 9090) to the local computer. To do so, we can for example:
* Use ssh [local port forwarding](https://www.ssh.com/academy/ssh/tunneling-example)
* Use ngrok port tunneling
For simplicity, we will use [Ngrok](https://ngrok.com/docs/) in this guide to tunnel Prometheus port from the TGI server to the outside word.
For that, you should follow the steps at https://dashboard.ngrok.com/get-started/setup/linux, and once Ngrok is installed, use:
```bash
ngrok http http://0.0.0.0:9090
```
As a sanity check, one can make sure that Prometheus server can be accessed at the URL given by Ngrok (in the style of https://d661-4-223-164-145.ngrok-free.app) from a local machine.
## Setup on the monitoring machine
Monitoring is typically done on an other machine than the server one. We use a Grafana dashboard to monitor TGI's server usage.
Two options are available:
* Use Grafana Cloud for an hosted dashboard solution (https://grafana.com/products/cloud/).
* Self-host a grafana dashboard.
In this tutorial, for simplicity, we will self host the dashbard. We recommend installing Grafana Open-source edition following [the official install instructions](https://grafana.com/grafana/download?platform=linux&edition=oss), using the available Linux binaries. For example:
```bash
wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
tar -zxvf grafana-11.0.0.linux-amd64.tar.gz
cd grafana-11.0.0
./bin/grafana-server
```
Once the Grafana server is launched, the Grafana interface is available at http://localhost:3000. One needs to log in with the `admin` username and `admin` password.
Once logged in, the Prometheus data source for Grafana needs to be configured, in the option `Add your first data source`. There, a Prometheus data source needs to be added with the Ngrok address we got earlier, that exposes Prometheus port (example: https://d661-4-223-164-145.ngrok-free.app).
Once Prometheus data source is configured, we can finally create our dashboard! From home, go to `Create your first dashboard` and then `Import dashboard`. There, we will use the recommended dashboard template [tgi_grafana.json](https://github.com/huggingface/text-generation-inference/blob/main/assets/grafana.json) for a dashboard ready to be used, but you may configure your own dashboard as you like.
Community contributed dashboard templates are also available, for example [here](https://grafana.com/grafana/dashboards/19831-text-generation-inference-dashboard/) or [here](https://grafana.com/grafana/dashboards/20246-text-generation-inference/).
Load your dashboard configuration, and your TGI dashboard should be ready to go!