Add Google Cloud in `docs/source/references/api_reference.md`

2024-10-05 16:54:17 +02:00 · 2024-10-05 16:54:17 +02:00 · 74489227e0
parent 47c01cb048
commit 74489227e0
1 changed files with 25 additions and 4 deletions
--- a/docs/source/reference/api_reference.md
+++ b/docs/source/reference/api_reference.md
@ -9,12 +9,13 @@
  - [Synchronous](#synchronous)
  - [Hugging Face Inference Endpoints](#hugging-face-inference-endpoints)
  - [Cloud Providers](#cloud-providers)
-      - [Amazon SageMaker](#amazon-sagemaker)
+    - [Amazon SageMaker](#amazon-sagemaker)
+    - [Google Cloud](#google-cloud)

 The HTTP API is a RESTful API that allows you to interact with the text-generation-inference component. Two endpoints are available:
-* Text Generation Inference [custom API](https://huggingface.github.io/text-generation-inference/)
-* OpenAI's [Messages API](#openai-messages-api)

+- Text Generation Inference [custom API](https://huggingface.github.io/text-generation-inference/)
+- OpenAI's [Messages API](#openai-messages-api)

 ## Text Generation Inference custom API

@ -137,7 +138,7 @@ for message in chat_completion:

 ## Cloud Providers

-TGI can be deployed on various cloud providers for scalable and robust text generation. One such provider is Amazon SageMaker, which has recently added support for TGI. Here's how you can deploy TGI on Amazon SageMaker:
+TGI can be deployed on various cloud providers for scalable and robust text generation. Among those cloud providers, both Amazon Sagemaker and Google Cloud have TGI integrations within their cloud offering.

 ### Amazon SageMaker

@ -186,3 +187,23 @@ predictor.predict({
    ]
 })
 ```
+
+### Google Cloud
+
+A collection of publicly available Deep Learning Containers (DLCs) are available for TGI on Google Cloud for services such as Google Kubernetes Engine (GKE), Vertex AI or Cloud Run.
+
+The TGI DLCs are built with the `--features google` flag as well as including the Google SDK installation, in order to better coexist within the Google Cloud environment, whilst being seamlessly integrated with Vertex AI supporting the custom I/O formatting.
+
+The DLCs are publicly available on the [Google Cloud Deep Learning Containers Documentation for TGI](https://cloud.google.com/deep-learning-containers/docs/choosing-container#text-generation-inference), the [Google Cloud Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io) or use the `gcloud` command to list the available containers with the tag `huggingface-text-generation-inference` as follows:
+
+```bash
+gcloud container images list --repository="us-docker.pkg.dev/deeplearning-platform-release/gcr.io" | grep "huggingface-text-generation-inference"
+```
+
+The containers can be used within any Google Cloud service, you can find some examples below:
+
+- [Deploy Meta Llama 3 8B with TGI DLC on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-deployment)
+- [Deploy Gemma 7B with TGI DLC on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai/vertex-notebook.ipynb)
+- [Deploy Meta Llama 3.1 8B with TGI DLC on Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run/tgi-deployment)
+
+More information and examples available in [the Google-Cloud-Containers repository](https://github.com/huggingface/Google-Cloud-Containers).