local-llm-server/DOCKER.md at f88e2362c57c352e7529ebb0e3c04ba1e3d1ef3d

This repository has been archived on 2024-10-27. You can view files and clone it, but cannot push or open issues or pull requests.

683 B

Raw Blame History

A Docker container for running VLLM on Paperspace Gradient notebooks.

Run jupyter server --generate-config and jupyter server password on your local machine, then copy Jupyter's config directory to ./jupyter
Place your Rathole client config at ./rathole-client.toml
docker build . -t "paperspace-vllm"

To test on your local machine, run this command:

docker run --shm-size 14g --gpus all \
  -v /storage/models/awq/MythoMax-L2-13B-AWQ:/models/MythoMax-L2-13B-AWQ \
  -p 7000:7000 -p 8888:8888 \
  -e API_SERVER_ARGS="--model /models/MythoMax-L2-13B-AWQ --quantization awq --max-num-batched-tokens 99999 --gpu-memory-utilization 1" \
  vllm-cloud

683 B Raw Blame History

683 B

Raw Blame History