hf_text-generation-inference/docs/source/index.md

# Text Generation Inference

Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.

![Text Generation Inference](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/TGI.png)

Text Generation Inference implements many optimizations and features, such as:

- Simple launcher to serve most popular LLMs
- Production ready (distributed tracing with Open Telemetry, Prometheus metrics)
- Tensor Parallelism for faster inference on multiple GPUs
- Token streaming using Server-Sent Events (SSE)
- Continuous batching of incoming requests for increased total throughput
- Optimized transformers code for inference using [Flash Attention](https://github.com/HazyResearch/flash-attention) and [Paged Attention](https://github.com/vllm-project/vllm) on the most popular architectures
- Quantization with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) and [GPT-Q](https://arxiv.org/abs/2210.17323)
- [Safetensors](https://github.com/huggingface/safetensors) weight loading
- Watermarking with [A Watermark for Large Language Models](https://arxiv.org/abs/2301.10226)
- Logits warper (temperature scaling, top-p, top-k, repetition penalty)
- Stop sequences
- Log probabilities
- Fine-tuning Support: Utilize fine-tuned models for specific tasks to achieve higher accuracy and performance.
- [Guidance](../conceptual/guidance): Enable function calling and tool-use by forcing the model to generate structured outputs based on your own predefined output schemas.

Text Generation Inference is used in production by multiple projects, such as:

- [Hugging Chat](https://github.com/huggingface/chat-ui), an open-source interface for open-access models, such as Open Assistant and Llama
- [OpenAssistant](https://open-assistant.io/), an open-source community effort to train LLMs in the open
- [nat.dev](http://nat.dev/), a playground to explore and compare LLMs.
Setup for doc-builder and docs for TGI (#740) I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil @osanseviero --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: osanseviero <osanseviero@gmail.com> Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu> 2023-08-10 02:24:52 -06:00			`# Text Generation Inference`

			`Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.`

			`![Text Generation Inference](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/TGI.png)`

			`Text Generation Inference implements many optimizations and features, such as:`

			`- Simple launcher to serve most popular LLMs`
			`- Production ready (distributed tracing with Open Telemetry, Prometheus metrics)`
			`- Tensor Parallelism for faster inference on multiple GPUs`
			`- Token streaming using Server-Sent Events (SSE)`
			`- Continuous batching of incoming requests for increased total throughput`
			`- Optimized transformers code for inference using [Flash Attention](https://github.com/HazyResearch/flash-attention) and [Paged Attention](https://github.com/vllm-project/vllm) on the most popular architectures`
			`- Quantization with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) and [GPT-Q](https://arxiv.org/abs/2210.17323)`
			`- [Safetensors](https://github.com/huggingface/safetensors) weight loading`
			`- Watermarking with [A Watermark for Large Language Models](https://arxiv.org/abs/2301.10226)`
			`- Logits warper (temperature scaling, top-p, top-k, repetition penalty)`
			`- Stop sequences`
			`- Log probabilities`
Remove some content from the README in favour of the documentation (#958) 2023-10-09 03:59:06 -06:00			`- Fine-tuning Support: Utilize fine-tuned models for specific tasks to achieve higher accuracy and performance.`
fix typos in docs and add small clarifications (#1790) # What does this PR do? Fix some small typos in the docs; add minor clarifications; add guidance to features on landing page ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? @OlivierDehaene 2024-04-22 10:15:48 -06:00			`- [Guidance](../conceptual/guidance): Enable function calling and tool-use by forcing the model to generate structured outputs based on your own predefined output schemas.`
Setup for doc-builder and docs for TGI (#740) I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil @osanseviero --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: osanseviero <osanseviero@gmail.com> Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu> 2023-08-10 02:24:52 -06:00
			`Text Generation Inference is used in production by multiple projects, such as:`

			`- [Hugging Chat](https://github.com/huggingface/chat-ui), an open-source interface for open-access models, such as Open Assistant and Llama`
			`- [OpenAssistant](https://open-assistant.io/), an open-source community effort to train LLMs in the open`
			`- [nat.dev](http://nat.dev/), a playground to explore and compare LLMs.`