Commit Graph

134 Commits

Author SHA1 Message Date
Cyberes 93d19fb95b fix exception 2023-10-01 10:25:32 -06:00
Cyberes d203973e80 fix routes 2023-10-01 01:13:13 -06:00
Cyberes 25ec56a5ef get streaming working, remove /v2/ 2023-10-01 00:20:00 -06:00
Cyberes b10d22ca0d cache the home page in the background 2023-09-30 23:03:42 -06:00
Cyberes 9235725bdd adjust message 2023-09-30 21:35:55 -06:00
Cyberes 61856b4383 adjust message 2023-09-30 21:34:32 -06:00
Cyberes 7af3dbd76b add message about settings 2023-09-30 21:31:25 -06:00
Cyberes 592eb08cb1 add message for /v1/ 2023-09-30 21:07:12 -06:00
Cyberes 166b2316e8 depricate v1 2023-09-30 20:59:24 -06:00
Cyberes e0f86d053a reorganize to api v2 2023-09-30 19:42:41 -06:00
Cyberes 114f36e709 functional 2023-09-30 19:41:50 -06:00
Cyberes 624ca74ce5 mvp 2023-09-29 00:09:44 -06:00
Cyberes e7b57cad7b set up cluster config and basic background workers 2023-09-28 18:40:24 -06:00
Cyberes e1d3fca6d3 try to cancel inference if disconnected from client 2023-09-28 09:55:31 -06:00
Cyberes e42f2b6819 fix negative queue on stats 2023-09-28 08:47:39 -06:00
Cyberes 347a82b7e1 avoid sending to backend to tokenize if it's greater than our specified context size 2023-09-28 03:54:20 -06:00
Cyberes 59f2aac8ad rewrite redis usage 2023-09-28 03:44:30 -06:00
Cyberes a4a1d6cce6 fix double logging 2023-09-28 01:34:15 -06:00
Cyberes e5fbc9545d add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer 2023-09-27 21:15:54 -06:00
Cyberes 43299b32ad clean up background threads 2023-09-27 19:39:04 -06:00
Cyberes 35e9847b27 set inference workers to daemon, add finally to inference worker, hide estimated avg tps 2023-09-27 18:36:51 -06:00
Cyberes 105b66d5e2 unify error message handling 2023-09-27 14:48:47 -06:00
Cyberes aba2e5b9c0 don't use db pooling, add LLM-ST-Errors header to disable formatted errors 2023-09-26 23:59:22 -06:00
Cyberes 048e5a8060 fix API key handling 2023-09-26 22:49:53 -06:00
Cyberes d9bbcc42e6 more work on openai endpoint 2023-09-26 22:09:11 -06:00
Cyberes e0af2ea9c5 convert to gunicorn 2023-09-26 13:32:33 -06:00
Cyberes 0eb901cb52 don't log entire request on failure 2023-09-26 12:32:19 -06:00
Cyberes 11e84db59c update database, tokenizer handle null prompt, convert top_p to vllm on openai, actually validate prompt on streaming, 2023-09-25 22:32:48 -06:00
Cyberes 8240a1ebbb fix background log not doing anything 2023-09-25 18:18:29 -06:00
Cyberes 8184e24bff fix sending error messages when streaming 2023-09-25 17:37:58 -06:00
Cyberes 7ce60079d7 fix typo 2023-09-25 17:24:51 -06:00
Cyberes 30282479a0 fix flask exception 2023-09-25 17:22:28 -06:00
Cyberes 135bd743bb fix homepage slowness, fix incorrect 24 hr prompters, fix redis wrapper, 2023-09-25 17:20:21 -06:00
Cyberes 52e6965b5e don't count SYSTEM tokens for recent prompters, fix sql exclude for SYSTEM tokens 2023-09-25 13:00:39 -06:00
Cyberes 3eaabc8c35 fix copied code 2023-09-25 12:38:02 -06:00
Cyberes 1646a00987 implement streaming on openai, improve streaming, run DB logging in background thread 2023-09-25 12:30:40 -06:00
Cyberes 6459a1c91b allow setting simultaneous IP limit per-token, fix token use tracker, fix tokens on streaming 2023-09-25 00:55:20 -06:00
Cyberes 320f51e01c further align openai endpoint with expected responses 2023-09-24 21:45:30 -06:00
Cyberes 8d6b2ce49c minor changes, add admin token auth system, add route to get backend info 2023-09-24 15:54:35 -06:00
Cyberes 2678102153 handle error while streaming 2023-09-24 13:27:27 -06:00
Cyberes 0015e653b2 adjust a few final things 2023-09-23 22:30:59 -06:00
Cyberes fab7b7ccdd active gen workers wait 2023-09-23 21:17:13 -06:00
Cyberes 7ee2311183 whats going on 2023-09-23 21:10:14 -06:00
Cyberes 94e845cd1a if there's less than num concurrent wait time is 0 2023-09-23 21:09:21 -06:00
Cyberes 41e622d19c fix two exceptions 2023-09-23 20:55:49 -06:00
Cyberes f67ac8175b fix wrong approach for streaming 2023-09-23 18:44:07 -06:00
Cyberes 8a4de7df44 oops 2023-09-23 18:01:12 -06:00
Cyberes 76a1428ba0 implement streaming for vllm 2023-09-23 17:57:23 -06:00
Cyberes f9a80f3028 change proompters 1 min to 5 min 2023-09-20 21:21:22 -06:00
Cyberes 03e3ec5490 port to mysql, use vllm tokenizer endpoint 2023-09-20 20:30:31 -06:00