local-llm-server/other/vllm/Docker/DOCKER.md

**A Docker container for running VLLM on Paperspace Gradient notebooks.**

1. Run `jupyter server --generate-config` and `jupyter server password` on your local machine, then copy Jupyter's config directory to `./jupyter`
2. Place your Rathole client config at `./rathole-client.toml`
3. `docker build . -t "paperspace-vllm"`

To test on your local machine, run this command:

```bash
docker run --shm-size 14g --gpus all \
  -v /storage/models/awq/MythoMax-L2-13B-AWQ:/models/MythoMax-L2-13B-AWQ \
  -p 7000:7000 -p 8888:8888 \
  -e API_SERVER_ARGS="--model /models/MythoMax-L2-13B-AWQ --quantization awq --max-num-batched-tokens 99999 --gpu-memory-utilization 1" \
  vllm-cloud
```
modify dockerfile for paperspace 2023-09-26 21:45:13 -06:00			`A Docker container for running VLLM on Paperspace Gradient notebooks.`

			1. Run `jupyter server --generate-config` and `jupyter server password` on your local machine, then copy Jupyter's config directory to `./jupyter`
			2. Place your Rathole client config at `./rathole-client.toml`
			3. `docker build . -t "paperspace-vllm"`

			`To test on your local machine, run this command:`

			```bash
			`docker run --shm-size 14g --gpus all \`
			`-v /storage/models/awq/MythoMax-L2-13B-AWQ:/models/MythoMax-L2-13B-AWQ \`
			`-p 7000:7000 -p 8888:8888 \`
more work on openai endpoint 2023-09-26 22:09:11 -06:00			`-e API_SERVER_ARGS="--model /models/MythoMax-L2-13B-AWQ --quantization awq --max-num-batched-tokens 99999 --gpu-memory-utilization 1" \`
modify dockerfile for paperspace 2023-09-26 21:45:13 -06:00			`vllm-cloud`
			```