local-llm-server

This repository has been archived on 2024-10-27. You can view files and clone it, but cannot push or open issues or pull requests.

History

Cyberes ee44371fdf Merge branch 'master' into cluster		2023-10-27 19:05:27 -06:00
..
Docker	Merge branch 'master' into cluster	2023-10-27 19:05:27 -06:00
README.md	fix division by 0, prettify /stats json, add js var to home	2023-09-16 17:37:43 -06:00
build-vllm.sh	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
vllm-gptq-setup-no-cuda.py	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
vllm.service	adjust requests timeout, add service file	2023-09-14 01:32:49 -06:00
vllm_api_server.py	fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences	2023-10-02 20:53:08 -06:00

Make sure your proxies all have a long timeout:

proxy_read_timeout 300;
proxy_connect_timeout 300;
proxy_send_timeout 300;

The LLM middleware has a request timeout of 95 so this longer timeout is to avoid any issues.

Make sure your model's tokenizer_config.json has 4096 set equal to or greater than your token limit.