From a70dd2998b3403da3d1ce9f3d5a27bd42db2d5c7 Mon Sep 17 00:00:00 2001 From: Nicolas Patry Date: Tue, 10 Dec 2024 01:20:07 +0530 Subject: [PATCH] Hotfixing the link. (#2811) --- docs/source/conceptual/chunking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conceptual/chunking.md b/docs/source/conceptual/chunking.md index f6489afd..9c4cbcdd 100644 --- a/docs/source/conceptual/chunking.md +++ b/docs/source/conceptual/chunking.md @@ -72,7 +72,7 @@ Long: `MODEL_ID=$MODEL_ID HOST=localhost:8000 k6 run load_tests/long.js` ### Results -![benchmarks_v3](https://github.com/huggingface/text-generation-inference/blob/main/assets/v3_benchmarks.png) +![benchmarks_v3](https://github.com/huggingface/text-generation-inference/blob/042791fbd5742b1644d42c493db6bec669df6537/assets/v3_benchmarks.png) Our benchmarking results show significant performance gains, with a 13x speedup over vLLM with prefix caching, and up to 30x speedup without prefix caching. These results are consistent with our production data and demonstrate the effectiveness of our optimized LLM architecture.