This repository has been archived on 2024-10-27. You can view files and clone it, but cannot push or open issues or pull requests.
local-llm-server/other/vllm
Cyberes ab408c6c5b ready for public release 2024-03-18 12:42:44 -06:00
..
Paperspace Notebook Docker ready for public release 2024-03-18 12:42:44 -06:00
README.md fix division by 0, prettify /stats json, add js var to home 2023-09-16 17:37:43 -06:00
vllm.service ready for public release 2024-03-18 12:42:44 -06:00
vllm_api_server.py update VLLM api server to upstream 2024-01-10 15:46:04 -07:00

README.md

Nginx

Make sure your proxies all have a long timeout:

proxy_read_timeout 300;
proxy_connect_timeout 300;
proxy_send_timeout 300;

The LLM middleware has a request timeout of 95 so this longer timeout is to avoid any issues.

Model Preperation

Make sure your model's tokenizer_config.json has 4096 set equal to or greater than your token limit.