3.8 KiB
3.8 KiB
Metrics
TGI exposes multiple metrics that can be collected via the /metrics
Prometheus endpoint.
These metrics can be used to monitor the performance of TGI, autoscale deployment and to help identify bottlenecks.
The following metrics are exposed:
Metric Name | Description | Type | Unit |
---|---|---|---|
tgi_batch_current_max_tokens |
Maximum tokens for the current batch | Gauge | Count |
tgi_batch_current_size |
Current batch size | Gauge | Count |
tgi_batch_decode_duration |
Time spent decoding a batch per method (prefill or decode) | Histogram | Seconds |
tgi_batch_filter_duration |
Time spent filtering batches and sending generated tokens per method (prefill or decode) | Histogram | Seconds |
tgi_batch_forward_duration |
Batch forward duration per method (prefill or decode) | Histogram | Seconds |
tgi_batch_inference_count |
Inference calls per method (prefill or decode) | Counter | Count |
tgi_batch_inference_duration |
Batch inference duration | Histogram | Seconds |
tgi_batch_inference_success |
Number of successful inference calls per method (prefill or decode) | Counter | Count |
tgi_batch_next_size |
Batch size of the next batch | Histogram | Count |
tgi_queue_size |
Current queue size | Gauge | Count |
tgi_request_count |
Total number of requests | Counter | Count |
tgi_request_duration |
Total time spent processing the request (e2e latency) | Histogram | Seconds |
tgi_request_generated_tokens |
Generated tokens per request | Histogram | Count |
tgi_request_inference_duration |
Request inference duration | Histogram | Seconds |
tgi_request_input_length |
Input token length per request | Histogram | Count |
tgi_request_max_new_tokens |
Maximum new tokens per request | Histogram | Count |
tgi_request_mean_time_per_token_duration |
Mean time per token per request (inter-token latency) | Histogram | Seconds |
tgi_request_queue_duration |
Time spent in the queue per request | Histogram | Seconds |
tgi_request_skipped_tokens |
Speculated tokens per request | Histogram | Count |
tgi_request_success |
Number of successful requests | Counter | |
tgi_request_validation_duration |
Time spent validating the request | Histogram | Seconds |