local-llm-server

This repository has been archived on 2024-10-27. You can view files and clone it, but cannot push or open issues or pull requests.

History

Cyberes 94141b8ecf fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences		2023-10-02 20:53:08 -06:00
..
Docker	remove secrets from dockerfile, use /storage instead	2023-09-28 17:02:45 -06:00
README.md	fix division by 0, prettify /stats json, add js var to home	2023-09-16 17:37:43 -06:00
build-vllm.sh	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
vllm-gptq-setup-no-cuda.py	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
vllm.service	adjust requests timeout, add service file	2023-09-14 01:32:49 -06:00
vllm_api_server.py	fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences	2023-10-02 20:53:08 -06:00

Make sure your proxies all have a long timeout:

proxy_read_timeout 300;
proxy_connect_timeout 300;
proxy_send_timeout 300;

The LLM middleware has a request timeout of 95 so this longer timeout is to avoid any issues.

Make sure your model's tokenizer_config.json has 4096 set equal to or greater than your token limit.