Update 'other/vllm/Docker/DOCKER.md'

This commit is contained in:
Cyberes 2023-09-30 13:32:46 -06:00
parent d3f529ca8b
commit c65b722211
1 changed files with 29 additions and 11 deletions

View File

@ -1,15 +1,33 @@
**A Docker container for running VLLM on Paperspace Gradient notebooks.**
1. Run `jupyter server --generate-config` and `jupyter server password` on your local machine, then copy Jupyter's config directory to `./jupyter`
2. Place your Rathole client config at `./rathole-client.toml`
3. `docker build . -t "paperspace-vllm"`
### Running
To test on your local machine, run this command:
1. In Paperspace, create a new notebook.
2. Click `Start from Scratch`.
3. Select your GPU and set the auto-shutdown timeout to 6 hours.
4. Click the `View Advanced Options` button at the bottom of the page. Enter these details in the form that appears:
- Container Name: `cyberes/vllm-paperspace:latest`
- Container Command: `/app/start.sh`
5. Start the notebook. It may take up to five minutes for them to pull and start the custom image.
6. Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted
storage: `/storage/vllm/`.
7. Enter your rathole client config in `/storage/vllm/rathole-client.toml`. If you need a visual text editor, first link the directory back to the Jupyter home: `ln -s /storage/vllm /notebooks`
8. Restart rathole with `supervisorctl restart rathole` and then view the log: `tail -f /var/log/app/rathole.log`. If you see lines that start with `INFO` and end with `Control channel established`, rathole has connected and is working. Error mesasges will begin
with `ERROR`.
9. Download an AWQ quantization from [TheBloke](https://huggingface.co/TheBloke) to `/storage/vllm/models/`.
10. Enter your VLLM commandline args in `/storage/vllm/cmd.txt`. You need to set `--model` to the path of the model you want to load.
11. Restart VLLM with `supervisorctl restart vllm` and then view the log: `tail -f /var/log/app/vllm.log`. It may take up to three minutes to load. When you see the line:
```
INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
```
       VLLM is running and ready for queries.
```bash
docker run --shm-size 14g --gpus all \
-v /storage/models/awq/MythoMax-L2-13B-AWQ:/models/MythoMax-L2-13B-AWQ \
-p 7000:7000 -p 8888:8888 \
-e API_SERVER_ARGS="--model /models/MythoMax-L2-13B-AWQ --quantization awq --max-num-batched-tokens 99999 --gpu-memory-utilization 1" \
vllm-cloud
```
12. In `/notebooks` (the home directory of Jupyter), the notebook `idle.ipynb` will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You **must** keep the running notebook open in a
browser tab.
### Building
You **must** have a GPU attached to your system when building the container.
1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
2. `sudo docker build .`