This repository has been archived on 2024-10-27. You can view files and clone it, but cannot push or open issues or pull requests.
Cyberes
94141b8ecf
fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences
|
2023-10-02 20:53:08 -06:00 |
.. |
Docker
|
remove secrets from dockerfile, use /storage instead
|
2023-09-28 17:02:45 -06:00 |
README.md
|
fix division by 0, prettify /stats json, add js var to home
|
2023-09-16 17:37:43 -06:00 |
build-vllm.sh
|
adjust logging, add more vllm stuff
|
2023-09-13 11:22:33 -06:00 |
vllm-gptq-setup-no-cuda.py
|
adjust logging, add more vllm stuff
|
2023-09-13 11:22:33 -06:00 |
vllm.service
|
adjust requests timeout, add service file
|
2023-09-14 01:32:49 -06:00 |
vllm_api_server.py
|
fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences
|
2023-10-02 20:53:08 -06:00 |
Nginx
Make sure your proxies all have a long timeout:
proxy_read_timeout 300;
proxy_connect_timeout 300;
proxy_send_timeout 300;
The LLM middleware has a request timeout of 95 so this longer timeout is to avoid any issues.
Model Preperation
Make sure your model's tokenizer_config.json
has 4096
set equal to or greater than your token limit.