48 lines
3.1 KiB
Markdown
48 lines
3.1 KiB
Markdown
**A Docker container for running VLLM on Paperspace Gradient notebooks.**
|
|
|
|
### Running
|
|
|
|
1. In Paperspace, create a new notebook.
|
|
2. Click `Start from Scratch`.
|
|
3. Select your GPU and set the auto-shutdown timeout to 6 hours.
|
|
4. Click the `View Advanced Options` button at the bottom of the page. Enter these details in the form that appears:
|
|
- Container Name: `cyberes/vllm-paperspace:latest`
|
|
- Container Command: `/app/start.sh`
|
|
5. Start the notebook. It may take up to five minutes for them to pull and start the custom image.
|
|
6. Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted
|
|
storage: `/storage/vllm/`.
|
|
7. Enter your rathole client config in `/storage/vllm/rathole-client.toml`. If you need a visual text editor, first link the directory back to the Jupyter home: `ln -s /storage/vllm /notebooks`
|
|
8. Restart rathole with `supervisorctl restart rathole` and then view the log: `tail -f /var/log/app/rathole.log`. If you see lines that start with `INFO` and end with `Control channel established`, rathole has connected and is working. Error mesasges will begin
|
|
with `ERROR`.
|
|
9. Download an AWQ quantization from [TheBloke](https://huggingface.co/TheBloke) to `/storage/vllm/models/`.
|
|
10. Enter your VLLM commandline args in `/storage/vllm/cmd.txt`. You need to set `--model` to the path of the model you want to load.
|
|
11. Restart VLLM with `supervisorctl restart vllm` and then view the log: `tail -f /var/log/app/vllm.log`. It may take up to three minutes to load. When you see the line:
|
|
```
|
|
INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
|
|
```
|
|
VLLM is running and ready for queries.
|
|
|
|
12. In `/notebooks` (the home directory of Jupyter), the notebook `idle.ipynb` will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You **must** keep the running notebook open in a
|
|
browser tab.
|
|
|
|
### Building
|
|
|
|
You **must** have a GPU attached to your system when building the container (required for building VLLM).
|
|
|
|
1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) and CUDA 11.8.
|
|
2. `bash build-docker.sh`
|
|
|
|
To run the container on your local machine:
|
|
|
|
```bash
|
|
sudo docker run -it --shm-size 14g --gpus all -v /home/user/testing123/notebooks:/notebooks -v /home/user/testing123/storage:/storage -p 8888:8888 cyberes/vllm-paperspace:latest
|
|
```
|
|
|
|
You will need to create a directory to mount inside the container (for example: `/home/user/testing123/`). Within this should be the folder `models` that holds the model to load, `rathole-client.toml`, and `cmd.txt`.
|
|
|
|
If you need to debug something, you can start a shell inside the container:
|
|
|
|
```bash
|
|
sudo docker run -it --shm-size 14g --gpus all -v /home/user/testing123/notebooks:/notebooks -v /home/user/testing123/storage:/storage -p 8888:8888 --entrypoint bash cyberes/vllm-paperspace:latest
|
|
```
|