This repository has been archived on 2024-10-27. You can view files and clone it, but cannot push or open issues or pull requests.
local-llm-server/other/vllm/Docker
Cyberes 5df955dae0 docker: fix?? 2023-10-15 12:12:01 -06:00
..
Dockerfile docker: fix?? 2023-10-15 12:12:01 -06:00
README.md docker: fix?? 2023-10-15 12:12:01 -06:00
idle.ipynb Upload files to 'other/vllm/Docker' 2023-09-30 13:25:10 -06:00
start-container.sh docker: fix?? 2023-10-15 12:12:01 -06:00
start-vllm.sh update docker 2023-09-29 22:28:38 -06:00
supervisord.conf docker: fix?? 2023-10-15 12:12:01 -06:00

README.md

A Docker container for running VLLM on Paperspace Gradient notebooks.

Running

  1. In Paperspace, create a new notebook.
  2. Click Start from Scratch.
  3. Select your GPU and set the auto-shutdown timeout to 6 hours.
  4. Click the View Advanced Options button at the bottom of the page. Enter these details in the form that appears:
    • Container Name: cyberes/vllm-paperspace:latest
    • Container Command: /app/start.sh
  5. Start the notebook. It may take up to five minutes for them to pull and start the custom image.
  6. Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted storage: /storage/vllm/.
  7. Enter your rathole client config in /storage/vllm/rathole-client.toml. If you need a visual text editor, first link the directory back to the Jupyter home: ln -s /storage/vllm /notebooks
  8. Restart rathole with supervisorctl restart rathole and then view the log: tail -f /var/log/app/rathole.log. If you see lines that start with INFO and end with Control channel established, rathole has connected and is working. Error mesasges will begin with ERROR.
  9. Download an AWQ quantization from TheBloke to /storage/vllm/models/.
  10. Enter your VLLM commandline args in /storage/vllm/cmd.txt. You need to set --model to the path of the model you want to load.
  11. Restart VLLM with supervisorctl restart vllm and then view the log: tail -f /var/log/app/vllm.log. It may take up to three minutes to load. When you see the line:
INFO:     Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)

       VLLM is running and ready for queries.

  1. In /notebooks (the home directory of Jupyter), the notebook idle.ipynb will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You must keep the running notebook open in a browser tab.

Building

You must have a GPU attached to your system when building the container (required for building VLLM).

  1. Install the NVIDIA Container Toolkit and CUDA 11.8.
  2. sudo docker build . If you want to build the latest VLLM, add --no-cache.