From 5a1cccbb98e08feaacb06484dcfd6b1230819b7b Mon Sep 17 00:00:00 2001 From: regisss <15324346+regisss@users.noreply.github.com> Date: Fri, 28 Jul 2023 09:14:03 +0200 Subject: [PATCH] Add section about TGI on other AI hardware accelerators in README (#715) # What does this PR do? As per title. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. --- README.md | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index effab42..c078360 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ -A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co) +A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co) to power LLMs api-inference widgets. ## Table of contents @@ -135,7 +135,7 @@ The Swagger UI is also available at: [https://huggingface.github.io/text-generat ### Using a private or gated model -You have the option to utilize the `HUGGING_FACE_HUB_TOKEN` environment variable for configuring the token employed by +You have the option to utilize the `HUGGING_FACE_HUB_TOKEN` environment variable for configuring the token employed by `text-generation-inference`. This allows you to gain access to protected resources. For example, if you want to serve the gated Llama V2 model variants: @@ -146,7 +146,7 @@ For example, if you want to serve the gated Llama V2 model variants: or with Docker: -```shell +```shell model=meta-llama/Llama-2-7b-chat-hf volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run token= @@ -195,7 +195,7 @@ Python 3.9, e.g. using `conda`: ```shell curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -conda create -n text-generation-inference python=3.9 +conda create -n text-generation-inference python=3.9 conda activate text-generation-inference ``` @@ -221,7 +221,7 @@ Then run: ```shell BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels -make run-falcon-7b-instruct +make run-falcon-7b-instruct ``` **Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run: @@ -232,7 +232,7 @@ sudo apt-get install libssl-dev gcc -y ### CUDA Kernels -The custom CUDA kernels are only tested on NVIDIA A100s. If you have any installation or runtime issues, you can remove +The custom CUDA kernels are only tested on NVIDIA A100s. If you have any installation or runtime issues, you can remove the kernels by using the `DISABLE_CUSTOM_KERNELS=True` environment variable. Be aware that the official Docker image has them enabled by default. @@ -242,7 +242,7 @@ Be aware that the official Docker image has them enabled by default. ### Run ```shell -make run-falcon-7b-instruct +make run-falcon-7b-instruct ``` ### Quantization @@ -273,3 +273,9 @@ make rust-tests # integration tests make integration-tests ``` + + +## Other supported hardware + +TGI is also supported on the following AI hardware accelerators: +- *Habana first-gen Gaudi and Gaudi2:* checkout [here](https://github.com/huggingface/optimum-habana/tree/main/text-generation-inference) how to serve models with TGI on Gaudi and Gaudi2 with [Optimum Habana](https://huggingface.co/docs/optimum/habana/index)