Update 'other/vllm/Docker/DOCKER.md'
This commit is contained in:
parent
d3f529ca8b
commit
c65b722211
|
@ -1,15 +1,33 @@
|
||||||
**A Docker container for running VLLM on Paperspace Gradient notebooks.**
|
**A Docker container for running VLLM on Paperspace Gradient notebooks.**
|
||||||
|
|
||||||
1. Run `jupyter server --generate-config` and `jupyter server password` on your local machine, then copy Jupyter's config directory to `./jupyter`
|
### Running
|
||||||
2. Place your Rathole client config at `./rathole-client.toml`
|
|
||||||
3. `docker build . -t "paperspace-vllm"`
|
|
||||||
|
|
||||||
To test on your local machine, run this command:
|
1. In Paperspace, create a new notebook.
|
||||||
|
2. Click `Start from Scratch`.
|
||||||
|
3. Select your GPU and set the auto-shutdown timeout to 6 hours.
|
||||||
|
4. Click the `View Advanced Options` button at the bottom of the page. Enter these details in the form that appears:
|
||||||
|
- Container Name: `cyberes/vllm-paperspace:latest`
|
||||||
|
- Container Command: `/app/start.sh`
|
||||||
|
5. Start the notebook. It may take up to five minutes for them to pull and start the custom image.
|
||||||
|
6. Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted
|
||||||
|
storage: `/storage/vllm/`.
|
||||||
|
7. Enter your rathole client config in `/storage/vllm/rathole-client.toml`. If you need a visual text editor, first link the directory back to the Jupyter home: `ln -s /storage/vllm /notebooks`
|
||||||
|
8. Restart rathole with `supervisorctl restart rathole` and then view the log: `tail -f /var/log/app/rathole.log`. If you see lines that start with `INFO` and end with `Control channel established`, rathole has connected and is working. Error mesasges will begin
|
||||||
|
with `ERROR`.
|
||||||
|
9. Download an AWQ quantization from [TheBloke](https://huggingface.co/TheBloke) to `/storage/vllm/models/`.
|
||||||
|
10. Enter your VLLM commandline args in `/storage/vllm/cmd.txt`. You need to set `--model` to the path of the model you want to load.
|
||||||
|
11. Restart VLLM with `supervisorctl restart vllm` and then view the log: `tail -f /var/log/app/vllm.log`. It may take up to three minutes to load. When you see the line:
|
||||||
|
```
|
||||||
|
INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
|
||||||
|
```
|
||||||
|
VLLM is running and ready for queries.
|
||||||
|
|
||||||
```bash
|
12. In `/notebooks` (the home directory of Jupyter), the notebook `idle.ipynb` will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You **must** keep the running notebook open in a
|
||||||
docker run --shm-size 14g --gpus all \
|
browser tab.
|
||||||
-v /storage/models/awq/MythoMax-L2-13B-AWQ:/models/MythoMax-L2-13B-AWQ \
|
|
||||||
-p 7000:7000 -p 8888:8888 \
|
### Building
|
||||||
-e API_SERVER_ARGS="--model /models/MythoMax-L2-13B-AWQ --quantization awq --max-num-batched-tokens 99999 --gpu-memory-utilization 1" \
|
|
||||||
vllm-cloud
|
You **must** have a GPU attached to your system when building the container.
|
||||||
```
|
|
||||||
|
1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
|
||||||
|
2. `sudo docker build .`
|
Reference in New Issue