Commit Graph

21 Commits

Author SHA1 Message Date
Cyberes 94141b8ecf fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences 2023-10-02 20:53:08 -06:00
Cyberes b0089859d7 fix ratelimiting 2023-10-02 02:05:15 -06:00
Cyberes 114f36e709 functional 2023-09-30 19:41:50 -06:00
Cyberes 624ca74ce5 mvp 2023-09-29 00:09:44 -06:00
Cyberes e7b57cad7b set up cluster config and basic background workers 2023-09-28 18:40:24 -06:00
Cyberes e42f2b6819 fix negative queue on stats 2023-09-28 08:47:39 -06:00
Cyberes 59f2aac8ad rewrite redis usage 2023-09-28 03:44:30 -06:00
Cyberes a4a1d6cce6 fix double logging 2023-09-28 01:34:15 -06:00
Cyberes e5fbc9545d add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer 2023-09-27 21:15:54 -06:00
Cyberes 43299b32ad clean up background threads 2023-09-27 19:39:04 -06:00
Cyberes 35e9847b27 set inference workers to daemon, add finally to inference worker, hide estimated avg tps 2023-09-27 18:36:51 -06:00
Cyberes 52e6965b5e don't count SYSTEM tokens for recent prompters, fix sql exclude for SYSTEM tokens 2023-09-25 13:00:39 -06:00
Cyberes 3100b0a924 set up queue to work with gunicorn processes, other improvements 2023-09-14 17:38:20 -06:00
Cyberes 4c9d543eab implement vllm backend 2023-09-11 20:47:19 -06:00
Cyberes ba0bc87434 add HF text-generation-inference backend 2023-08-29 13:46:41 -06:00
Cyberes 6c0e60135d exclude tokens with priority 0 from simultaneous requests ratelimit 2023-08-28 00:03:25 -06:00
Cyberes c16d70a24d limit amount of simultaneous requests an IP can make 2023-08-27 23:48:10 -06:00
Cyberes 11a0b6541f fix some stuff related to gunicorn workers 2023-08-23 22:01:06 -06:00
Cyberes de19af900f add estimated wait time and other time tracking stats 2023-08-23 21:33:52 -06:00
Cyberes 0aa52863bc forgot to start workers 2023-08-23 20:33:49 -06:00
Cyberes 6f8b70df54 add a queue system 2023-08-23 20:12:38 -06:00