Cyberes 521db377b4 | ||
---|---|---|
.. | ||
Dockerfile | ||
README.md | ||
build-docker.sh | ||
idle.ipynb | ||
start-container.sh | ||
start-vllm.sh | ||
supervisord.conf | ||
update-container.sh |
README.md
A Docker container for running VLLM on Paperspace Gradient notebooks.
Running
- In Paperspace, create a new notebook.
- Click
Start from Scratch
. - Select your GPU and set the auto-shutdown timeout to 6 hours.
- Click the
View Advanced Options
button at the bottom of the page. Enter these details in the form that appears:- Container Name:
cyberes/vllm-paperspace:latest
- Container Command:
/app/start.sh
- Container Name:
- Start the notebook. It may take up to five minutes for them to pull and start the custom image.
- Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted
storage:
/storage/vllm/
. - Enter your rathole client config in
/storage/vllm/rathole-client.toml
. If you need a visual text editor, first link the directory back to the Jupyter home:ln -s /storage/vllm /notebooks
- Restart rathole with
supervisorctl restart rathole
and then view the log:tail -f /var/log/app/rathole.log
. If you see lines that start withINFO
and end withControl channel established
, rathole has connected and is working. Error mesasges will begin withERROR
. - Download an AWQ quantization from TheBloke to
/storage/vllm/models/
. - Enter your VLLM commandline args in
/storage/vllm/cmd.txt
. You need to set--model
to the path of the model you want to load. - Restart VLLM with
supervisorctl restart vllm
and then view the log:tail -f /var/log/app/vllm.log
. It may take up to three minutes to load. When you see the line:
INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
VLLM is running and ready for queries.
- In
/notebooks
(the home directory of Jupyter), the notebookidle.ipynb
will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You must keep the running notebook open in a browser tab.
Building
You must have a GPU attached to your system when building the container (required for building VLLM).
- Install the NVIDIA Container Toolkit and CUDA 11.8.
sudo docker build .
If you want to build the latest VLLM, add--no-cache
To run the container on your local machine:
sudo docker run -it --shm-size 14g --gpus all -v /home/user/testing123/notebooks:/notebooks -v /home/user/testing123/storage:/storage -p 8888:8888 cyberes/vllm-paperspace:latest
You will need to create a directory to mount inside the container (for example: /home/user/testing123/
). Within this should be the folder models
that holds the model to load, rathole-client.toml
, and cmd.txt
.
If you need to debug something, you can start a shell inside the container:
sudo docker run -it --shm-size 14g --gpus all -v /home/user/testing123/notebooks:/notebooks -v /home/user/testing123/storage:/storage -p 8888:8888 --entrypoint bash cyberes/vllm-paperspace:latest