From 5e5e9d4bbd51fafcca3cf4da39170e40951ca638 Mon Sep 17 00:00:00 2001 From: lewtun Date: Thu, 23 Mar 2023 18:03:45 +0100 Subject: [PATCH] feat: Add note about NVIDIA drivers (#64) Co-authored-by: OlivierDehaene --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 5cd26a0e..ee724487 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,7 @@ volume=$PWD/data # share a volume with the Docker container to avoid downloading docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard ``` +**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher. You can then query the model using either the `/generate` or `/generate_stream` routes: @@ -119,8 +120,6 @@ for response in client.generate_stream("What is Deep Learning?", max_new_tokens= print(text) ``` -**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). - ### API documentation You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route.