diff --git a/other/vllm/Docker/DOCKER.md b/other/vllm/Docker/DOCKER.md index 6abf6bf..3713b18 100644 --- a/other/vllm/Docker/DOCKER.md +++ b/other/vllm/Docker/DOCKER.md @@ -1,15 +1,33 @@ **A Docker container for running VLLM on Paperspace Gradient notebooks.** -1. Run `jupyter server --generate-config` and `jupyter server password` on your local machine, then copy Jupyter's config directory to `./jupyter` -2. Place your Rathole client config at `./rathole-client.toml` -3. `docker build . -t "paperspace-vllm"` +### Running -To test on your local machine, run this command: +1. In Paperspace, create a new notebook. +2. Click `Start from Scratch`. +3. Select your GPU and set the auto-shutdown timeout to 6 hours. +4. Click the `View Advanced Options` button at the bottom of the page. Enter these details in the form that appears: + - Container Name: `cyberes/vllm-paperspace:latest` + - Container Command: `/app/start.sh` +5. Start the notebook. It may take up to five minutes for them to pull and start the custom image. +6. Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted + storage: `/storage/vllm/`. +7. Enter your rathole client config in `/storage/vllm/rathole-client.toml`. If you need a visual text editor, first link the directory back to the Jupyter home: `ln -s /storage/vllm /notebooks` +8. Restart rathole with `supervisorctl restart rathole` and then view the log: `tail -f /var/log/app/rathole.log`. If you see lines that start with `INFO` and end with `Control channel established`, rathole has connected and is working. Error mesasges will begin + with `ERROR`. +9. Download an AWQ quantization from [TheBloke](https://huggingface.co/TheBloke) to `/storage/vllm/models/`. +10. Enter your VLLM commandline args in `/storage/vllm/cmd.txt`. You need to set `--model` to the path of the model you want to load. +11. Restart VLLM with `supervisorctl restart vllm` and then view the log: `tail -f /var/log/app/vllm.log`. It may take up to three minutes to load. When you see the line: + ``` + INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit) + ``` +       VLLM is running and ready for queries. -```bash -docker run --shm-size 14g --gpus all \ - -v /storage/models/awq/MythoMax-L2-13B-AWQ:/models/MythoMax-L2-13B-AWQ \ - -p 7000:7000 -p 8888:8888 \ - -e API_SERVER_ARGS="--model /models/MythoMax-L2-13B-AWQ --quantization awq --max-num-batched-tokens 99999 --gpu-memory-utilization 1" \ - vllm-cloud -``` \ No newline at end of file +12. In `/notebooks` (the home directory of Jupyter), the notebook `idle.ipynb` will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You **must** keep the running notebook open in a + browser tab. + +### Building + +You **must** have a GPU attached to your system when building the container. + +1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). +2. `sudo docker build .` \ No newline at end of file