local-llm-server/other/vllm/Paperspace Notebook Docker/README.md

**A Docker container for running VLLM on Paperspace Gradient notebooks.**

### Running

1. In Paperspace, create a new notebook.
2. Click `Start from Scratch`.
3. Select your GPU and set the auto-shutdown timeout to 6 hours.
4. Click the `View Advanced Options` button at the bottom of the page. Enter these details in the form that appears:
    - Container Name: `cyberes/vllm-paperspace:latest`
    - Container Command: `/app/start.sh`
5. Start the notebook. It may take up to five minutes for them to pull and start the custom image.
6. Once the container is started, open the log viewer by clicking the icon in the bottom left of the screen. You should see errors from rathole and VLLM as a result of the blank config files. The container will create a new directory in your mounted
   storage: `/storage/vllm/`.
7. Enter your rathole client config in `/storage/vllm/rathole-client.toml`. If you need a visual text editor, first link the directory back to the Jupyter home: `ln -s /storage/vllm /notebooks`
8. Restart rathole with `supervisorctl restart rathole` and then view the log: `tail -f /var/log/app/rathole.log`. If you see lines that start with `INFO` and end with `Control channel established`, rathole has connected and is working. Error mesasges will begin
   with `ERROR`.
9. Download an AWQ quantization from [TheBloke](https://huggingface.co/TheBloke) to `/storage/vllm/models/`.
10. Enter your VLLM commandline args in `/storage/vllm/cmd.txt`. You need to set `--model` to the path of the model you want to load.
11. Restart VLLM with `supervisorctl restart vllm` and then view the log: `tail -f /var/log/app/vllm.log`. It may take up to three minutes to load. When you see the line:
   ```
  INFO:     Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
   ```
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;VLLM is running and ready for queries.

12. In `/notebooks` (the home directory of Jupyter), the notebook `idle.ipynb` will automatically be created. Run this notebook so Paperspace does not shut down your machine due to "inactivity". You **must** keep the running notebook open in a
    browser tab.

### Building

You **must** have a GPU attached to your system when building the container (required for building VLLM).

1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) and CUDA 11.8.
2. `bash build-docker.sh`

To run the container on your local machine:

```bash
sudo docker run -it --shm-size 14g --gpus all -v /home/user/testing123/notebooks:/notebooks -v /home/user/testing123/storage:/storage -p 8888:8888 cyberes/vllm-paperspace:latest
```

You will need to create a directory to mount inside the container (for example: `/home/user/testing123/`). Within this should be the folder `models` that holds the model to load, `rathole-client.toml`, and `cmd.txt`.

If you need to debug something, you can start a shell inside the container:

```bash
sudo docker run -it --shm-size 14g --gpus all -v /home/user/testing123/notebooks:/notebooks -v /home/user/testing123/storage:/storage -p 8888:8888 --entrypoint bash cyberes/vllm-paperspace:latest
```