Update 'other/vllm/Docker/DOCKER.md'

2023-09-30 13:32:46 -06:00 · 2023-09-30 13:32:46 -06:00 · c65b722211
parent d3f529ca8b
commit c65b722211
1 changed files with 29 additions and 11 deletions
--- a/other/vllm/Docker/DOCKER.md
+++ b/other/vllm/Docker/DOCKER.md
@ -1,15 +1,33 @@
 **A Docker container for running VLLM on Paperspace Gradient notebooks.**

-1. Run `jupyter server --generate-config` and `jupyter server password` on your local machine, then copy Jupyter's config directory to `./jupyter`
-2. Place your Rathole client config at `./rathole-client.toml`
-3. `docker build . -t "paperspace-vllm"`
+### Running

-To test on your local machine, run this command:
+1. In Paperspace, create a new notebook.
+2. Click `Start from Scratch`.
+3. Select your GPU and set the auto-shutdown timeout to 6 hours.
+4. Click the `View Advanced Options` button at the bottom of the page. Enter these details in the form that appears:
+    - Container Name: `cyberes/vllm-paperspace:latest`
+    - Container Command: `/app/start.sh`
+5. Start the notebook. It may take up to five minutes for them to pull and start the custom image.
+6. Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted
+   storage: `/storage/vllm/`.
+7. Enter your rathole client config in `/storage/vllm/rathole-client.toml`. If you need a visual text editor, first link the directory back to the Jupyter home: `ln -s /storage/vllm /notebooks`
+8. Restart rathole with `supervisorctl restart rathole` and then view the log: `tail -f /var/log/app/rathole.log`. If you see lines that start with `INFO` and end with `Control channel established`, rathole has connected and is working. Error mesasges will begin
+   with `ERROR`.
+9. Download an AWQ quantization from [TheBloke](https://huggingface.co/TheBloke) to `/storage/vllm/models/`.
+10. Enter your VLLM commandline args in `/storage/vllm/cmd.txt`. You need to set `--model` to the path of the model you want to load.
+11. Restart VLLM with `supervisorctl restart vllm` and then view the log: `tail -f /var/log/app/vllm.log`. It may take up to three minutes to load. When you see the line:
+   ```
+  INFO:     Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
+   ```
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;VLLM is running and ready for queries.

-```bash
-docker run --shm-size 14g --gpus all \
-  -v /storage/models/awq/MythoMax-L2-13B-AWQ:/models/MythoMax-L2-13B-AWQ \
-  -p 7000:7000 -p 8888:8888 \
-  -e API_SERVER_ARGS="--model /models/MythoMax-L2-13B-AWQ --quantization awq --max-num-batched-tokens 99999 --gpu-memory-utilization 1" \
-  vllm-cloud
-```
+12. In `/notebooks` (the home directory of Jupyter), the notebook `idle.ipynb` will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You **must** keep the running notebook open in a
+    browser tab.
+
+### Building
+
+You **must** have a GPU attached to your system when building the container.
+
+1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
+2. `sudo docker build .`