Update 'other/vllm/Docker/DOCKER.md'
This commit is contained in:
parent
d3f529ca8b
commit
c65b722211
|
@ -1,15 +1,33 @@
|
|||
**A Docker container for running VLLM on Paperspace Gradient notebooks.**
|
||||
|
||||
1. Run `jupyter server --generate-config` and `jupyter server password` on your local machine, then copy Jupyter's config directory to `./jupyter`
|
||||
2. Place your Rathole client config at `./rathole-client.toml`
|
||||
3. `docker build . -t "paperspace-vllm"`
|
||||
### Running
|
||||
|
||||
To test on your local machine, run this command:
|
||||
1. In Paperspace, create a new notebook.
|
||||
2. Click `Start from Scratch`.
|
||||
3. Select your GPU and set the auto-shutdown timeout to 6 hours.
|
||||
4. Click the `View Advanced Options` button at the bottom of the page. Enter these details in the form that appears:
|
||||
- Container Name: `cyberes/vllm-paperspace:latest`
|
||||
- Container Command: `/app/start.sh`
|
||||
5. Start the notebook. It may take up to five minutes for them to pull and start the custom image.
|
||||
6. Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted
|
||||
storage: `/storage/vllm/`.
|
||||
7. Enter your rathole client config in `/storage/vllm/rathole-client.toml`. If you need a visual text editor, first link the directory back to the Jupyter home: `ln -s /storage/vllm /notebooks`
|
||||
8. Restart rathole with `supervisorctl restart rathole` and then view the log: `tail -f /var/log/app/rathole.log`. If you see lines that start with `INFO` and end with `Control channel established`, rathole has connected and is working. Error mesasges will begin
|
||||
with `ERROR`.
|
||||
9. Download an AWQ quantization from [TheBloke](https://huggingface.co/TheBloke) to `/storage/vllm/models/`.
|
||||
10. Enter your VLLM commandline args in `/storage/vllm/cmd.txt`. You need to set `--model` to the path of the model you want to load.
|
||||
11. Restart VLLM with `supervisorctl restart vllm` and then view the log: `tail -f /var/log/app/vllm.log`. It may take up to three minutes to load. When you see the line:
|
||||
```
|
||||
INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
|
||||
```
|
||||
VLLM is running and ready for queries.
|
||||
|
||||
```bash
|
||||
docker run --shm-size 14g --gpus all \
|
||||
-v /storage/models/awq/MythoMax-L2-13B-AWQ:/models/MythoMax-L2-13B-AWQ \
|
||||
-p 7000:7000 -p 8888:8888 \
|
||||
-e API_SERVER_ARGS="--model /models/MythoMax-L2-13B-AWQ --quantization awq --max-num-batched-tokens 99999 --gpu-memory-utilization 1" \
|
||||
vllm-cloud
|
||||
```
|
||||
12. In `/notebooks` (the home directory of Jupyter), the notebook `idle.ipynb` will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You **must** keep the running notebook open in a
|
||||
browser tab.
|
||||
|
||||
### Building
|
||||
|
||||
You **must** have a GPU attached to your system when building the container.
|
||||
|
||||
1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
|
||||
2. `sudo docker build .`
|
Reference in New Issue