hf_text-generation-inference/docs/source/supported_models.md


# Supported Models and Hardware

Text Generation Inference enables serving optimized models on specific hardware for the highest performance. The following sections list which models are hardware are supported.

## Supported Models

- [Deepseek V2](https://huggingface.co/deepseek-ai/DeepSeek-V2)
- [Idefics 2](https://huggingface.co/HuggingFaceM4/idefics2-8b) (Multimodal)
- [Llava Next (1.6)](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) (Multimodal)
- [Llama](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
- [Phi 3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
- [Gemma](https://huggingface.co/google/gemma-7b)
- [PaliGemma](https://huggingface.co/google/paligemma-3b-pt-224)
- [Gemma2](https://huggingface.co/google/gemma2-9b)
- [Cohere](https://huggingface.co/CohereForAI/c4ai-command-r-plus)
- [Dbrx](https://huggingface.co/databricks/dbrx-instruct)
- [Mamba](https://huggingface.co/state-spaces/mamba-2.8b-slimpj)
- [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
- [Mixtral](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)
- [Gpt Bigcode](https://huggingface.co/bigcode/gpt_bigcode-santacoder)
- [Phi](https://huggingface.co/microsoft/phi-1_5)
- [Baichuan](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)
- [Falcon](https://huggingface.co/tiiuae/falcon-7b-instruct)
- [StarCoder 2](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1)
- [Qwen 2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)
- [Opt](https://huggingface.co/facebook/opt-6.7b)
- [T5](https://huggingface.co/google/flan-t5-xxl)
- [Galactica](https://huggingface.co/facebook/galactica-120b)
- [SantaCoder](https://huggingface.co/bigcode/santacoder)
- [Bloom](https://huggingface.co/bigscience/bloom-560m)
- [Mpt](https://huggingface.co/mosaicml/mpt-7b-instruct)
- [Gpt2](https://huggingface.co/openai-community/gpt2)
- [Gpt Neox](https://huggingface.co/EleutherAI/gpt-neox-20b)
- [Idefics](https://huggingface.co/HuggingFaceM4/idefics-9b) (Multimodal)


If the above list lacks the model you would like to serve, depending on the model's pipeline type, you can try to initialize and serve the model anyways to see how well it performs, but performance isn't guaranteed for non-optimized models:

```python
# for causal LMs/text-generation models
AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")`
# or, for text-to-text generation models
AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")
```

If you wish to serve a supported model that already exists on a local folder, just point to the local folder.

```bash
text-generation-launcher --model-id <PATH-TO-LOCAL-BLOOM>
```
Creating doc automatically for supported models. (#1929) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> 2024-05-22 08:22:57 -06:00
Setup for doc-builder and docs for TGI (#740) I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil @osanseviero --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: osanseviero <osanseviero@gmail.com> Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu> 2023-08-10 02:24:52 -06:00			`# Supported Models and Hardware`

			`Text Generation Inference enables serving optimized models on specific hardware for the highest performance. The following sections list which models are hardware are supported.`

			`## Supported Models`

Add support for Deepseek V2 (#2224) Deepseek V2 is a MoE model from Deepseek. Relevant variations compared to other models: - Grouped top-K in expert selection. - mscale in yarn is calculated using the `mscale` and `mscale_all_dim` configuration options. - `mscale_all_dim` is also used in scaling attention softmax. - Permuting of the query/key representations before applying rotary embeddings. - Some projections cannot be sharded (`q_a_proj`, `kv_a_proj_with_mqa`). So, we need weight loads that supports quantized weights. To this end `{Weights,WeightLoader}.get_weight` was added. - The query/key head dimensionality differs from that of the value, so we need to pad during attention. - Heads with size 192, needs an extension to our paged attention fork and we need to ensure that the KV cache is allocated with the correct size. - Shared experts. 2024-07-19 09:23:20 -06:00			`- [Deepseek V2](https://huggingface.co/deepseek-ai/DeepSeek-V2)`
Creating doc automatically for supported models. (#1929) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> 2024-05-22 08:22:57 -06:00			`- [Idefics 2](https://huggingface.co/HuggingFaceM4/idefics2-8b) (Multimodal)`
			`- [Llava Next (1.6)](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) (Multimodal)`
			`- [Llama](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)`
			`- [Phi 3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)`
			`- [Gemma](https://huggingface.co/google/gemma-7b)`
Refactor dead code - Removing all `flash_xxx.py` files. (#2166) * Refactor dead code. * First working step. * Remove a lot of duplicated code. * More dead code. * More cleanup. * Fix Santacoder test. * Fixing the simple tests. * Fixing sharding. * Fixes for VLM. * Fixing santacoder (num_kv_heads hardcoded). * Removing more dead code. * Fixing `config.n_head`. * Stopping earlier because of `<end_of_utterance>` in idefics2. * Addresses comments. * Removing the dead code. * Fuse back mistral into FlashCausalLM. * Finish removal. * Fixing docs + causal_lm `batch_class`. * Fixing docs + causal.lm. * Add default to Gemma Causality. * Default value for gemma/gemma2. * Wrong default. 2024-07-05 02:29:56 -06:00			`- [PaliGemma](https://huggingface.co/google/paligemma-3b-pt-224)`
Fixing gemma2. (#2135) * Fixing gemma2. * Adding new model. 2024-06-27 08:04:20 -06:00			`- [Gemma2](https://huggingface.co/google/gemma2-9b)`
Creating doc automatically for supported models. (#1929) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> 2024-05-22 08:22:57 -06:00			`- [Cohere](https://huggingface.co/CohereForAI/c4ai-command-r-plus)`
			`- [Dbrx](https://huggingface.co/databricks/dbrx-instruct)`
			`- [Mamba](https://huggingface.co/state-spaces/mamba-2.8b-slimpj)`
			`- [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)`
			`- [Mixtral](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)`
			`- [Gpt Bigcode](https://huggingface.co/bigcode/gpt_bigcode-santacoder)`
			`- [Phi](https://huggingface.co/microsoft/phi-1_5)`
			`- [Baichuan](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)`
			`- [Falcon](https://huggingface.co/tiiuae/falcon-7b-instruct)`
			`- [StarCoder 2](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1)`
Update the link for qwen2 (#2068) * Update the link for qwen2 * Fix Qwen2 model URL in model table * Fix too eager staging --------- Co-authored-by: Daniël de Kok <me@danieldk.eu> 2024-06-14 03:59:33 -06:00			`- [Qwen 2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)`
Creating doc automatically for supported models. (#1929) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> 2024-05-22 08:22:57 -06:00			`- [Opt](https://huggingface.co/facebook/opt-6.7b)`
			`- [T5](https://huggingface.co/google/flan-t5-xxl)`
Setup for doc-builder and docs for TGI (#740) I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil @osanseviero --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: osanseviero <osanseviero@gmail.com> Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu> 2023-08-10 02:24:52 -06:00			`- [Galactica](https://huggingface.co/facebook/galactica-120b)`
			`- [SantaCoder](https://huggingface.co/bigcode/santacoder)`
Creating doc automatically for supported models. (#1929) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> 2024-05-22 08:22:57 -06:00			`- [Bloom](https://huggingface.co/bigscience/bloom-560m)`
			`- [Mpt](https://huggingface.co/mosaicml/mpt-7b-instruct)`
			`- [Gpt2](https://huggingface.co/openai-community/gpt2)`
			`- [Gpt Neox](https://huggingface.co/EleutherAI/gpt-neox-20b)`
			`- [Idefics](https://huggingface.co/HuggingFaceM4/idefics-9b) (Multimodal)`

Setup for doc-builder and docs for TGI (#740) I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil @osanseviero --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: osanseviero <osanseviero@gmail.com> Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu> 2023-08-10 02:24:52 -06:00
			`If the above list lacks the model you would like to serve, depending on the model's pipeline type, you can try to initialize and serve the model anyways to see how well it performs, but performance isn't guaranteed for non-optimized models:`

			```python
			`# for causal LMs/text-generation models`
			AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")`
			`# or, for text-to-text generation models`
			`AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")`
			```

Added note on weight-cache-override (#994) Added note on serving supported models from a different folder without re-downloading them. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> 2023-09-27 03:06:07 -06:00			`If you wish to serve a supported model that already exists on a local folder, just point to the local folder.`

			```bash
			`text-generation-launcher --model-id <PATH-TO-LOCAL-BLOOM>`
Creating doc automatically for supported models. (#1929) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> 2024-05-22 08:22:57 -06:00			```