Commit Graph

242 Commits

Author SHA1 Message Date
Cyberes 6af5365015 c 2023-10-04 12:45:20 -06:00
Cyberes f3a13fcda8 c 2023-10-04 12:44:33 -06:00
Cyberes a15b5465df c 2023-10-04 12:44:09 -06:00
Cyberes 95d781725e t 2023-10-04 12:42:18 -06:00
Cyberes 1b21cb69c1 test 2023-10-04 12:40:29 -06:00
Cyberes 4deb32bf1c test 2023-10-04 10:32:11 -06:00
Cyberes 7e3af3599d test 2023-10-04 10:29:58 -06:00
Cyberes 4634e36eeb text 2023-10-04 10:26:39 -06:00
Cyberes b76e77a66a fix exception 2023-10-04 10:24:28 -06:00
Cyberes 5f4e4710c1 option to prioritize by parameter count 2023-10-04 10:19:44 -06:00
Cyberes 6dc3529190 show online status on stats page 2023-10-03 23:39:25 -06:00
Cyberes 1a7f22ec55 adjust again 2023-10-03 20:47:37 -06:00
Cyberes 67f5df9bb9 fix stats page 2023-10-03 20:42:53 -06:00
Cyberes 33b4b8404b clean up streaming 2023-10-03 14:10:50 -06:00
Cyberes e16f415749 fix 2023-10-03 13:49:00 -06:00
Cyberes 581a0fec99 fix exception 2023-10-03 13:47:18 -06:00
Cyberes 32ad97e57c do default model rather than default backend, adjust moderation endpoint logic and add timeout, exclude system tokens from recent proompters, calculate number of moderators from endpoint concurrent gens, adjust homepage 2023-10-03 13:40:08 -06:00
Cyberes 63c12ea830 fix 2023-10-03 01:25:43 -06:00
Cyberes f6acd67738 t 2023-10-03 00:05:32 -06:00
Cyberes 07d6f6d8e9 test 2023-10-03 00:03:39 -06:00
Cyberes cd325216e2 test 2023-10-02 22:45:07 -06:00
Cyberes 94141b8ecf fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences 2023-10-02 20:53:08 -06:00
Cyberes b0089859d7 fix ratelimiting 2023-10-02 02:05:15 -06:00
Cyberes d1c4e68f8b fix openai models response 2023-10-01 23:07:49 -06:00
Cyberes 21da2f6373 fix openai error message 2023-10-01 22:58:08 -06:00
Cyberes f7e9687527 finish openai endpoints 2023-10-01 16:04:53 -06:00
Cyberes 2a3ff7e21e update openai endpoints 2023-10-01 14:15:01 -06:00
Cyberes 93d19fb95b fix exception 2023-10-01 10:25:32 -06:00
Cyberes d203973e80 fix routes 2023-10-01 01:13:13 -06:00
Cyberes 25ec56a5ef get streaming working, remove /v2/ 2023-10-01 00:20:00 -06:00
Cyberes b10d22ca0d cache the home page in the background 2023-09-30 23:03:42 -06:00
Cyberes 9235725bdd adjust message 2023-09-30 21:35:55 -06:00
Cyberes 61856b4383 adjust message 2023-09-30 21:34:32 -06:00
Cyberes 7af3dbd76b add message about settings 2023-09-30 21:31:25 -06:00
Cyberes 592eb08cb1 add message for /v1/ 2023-09-30 21:07:12 -06:00
Cyberes 166b2316e8 depricate v1 2023-09-30 20:59:24 -06:00
Cyberes 1151bb5475 adjust stats 2023-09-30 20:42:48 -06:00
Cyberes e0f86d053a reorganize to api v2 2023-09-30 19:42:41 -06:00
Cyberes 114f36e709 functional 2023-09-30 19:41:50 -06:00
Cyberes 624ca74ce5 mvp 2023-09-29 00:09:44 -06:00
Cyberes e7b57cad7b set up cluster config and basic background workers 2023-09-28 18:40:24 -06:00
Cyberes e1d3fca6d3 try to cancel inference if disconnected from client 2023-09-28 09:55:31 -06:00
Cyberes e42f2b6819 fix negative queue on stats 2023-09-28 08:47:39 -06:00
Cyberes 347a82b7e1 avoid sending to backend to tokenize if it's greater than our specified context size 2023-09-28 03:54:20 -06:00
Cyberes 59f2aac8ad rewrite redis usage 2023-09-28 03:44:30 -06:00
Cyberes a4a1d6cce6 fix double logging 2023-09-28 01:34:15 -06:00
Cyberes ecdf819088 fix try/finally with continue, fix wrong subclass signature 2023-09-28 00:11:34 -06:00
Cyberes e86a5182eb redo background processes, reorganize server.py 2023-09-27 23:36:44 -06:00
Cyberes e5fbc9545d add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer 2023-09-27 21:15:54 -06:00
Cyberes 43299b32ad clean up background threads 2023-09-27 19:39:04 -06:00
Cyberes 35e9847b27 set inference workers to daemon, add finally to inference worker, hide estimated avg tps 2023-09-27 18:36:51 -06:00
Cyberes 105b66d5e2 unify error message handling 2023-09-27 14:48:47 -06:00
Cyberes 957a6cd092 fix error handling 2023-09-27 14:36:49 -06:00
Cyberes aba2e5b9c0 don't use db pooling, add LLM-ST-Errors header to disable formatted errors 2023-09-26 23:59:22 -06:00
Cyberes 048e5a8060 fix API key handling 2023-09-26 22:49:53 -06:00
Cyberes d9bbcc42e6 more work on openai endpoint 2023-09-26 22:09:11 -06:00
Cyberes e0af2ea9c5 convert to gunicorn 2023-09-26 13:32:33 -06:00
Cyberes 0eb901cb52 don't log entire request on failure 2023-09-26 12:32:19 -06:00
Cyberes bbdb9c9d55 try to prevent "### XXX" responses on openai 2023-09-25 23:14:35 -06:00
Cyberes 11e84db59c update database, tokenizer handle null prompt, convert top_p to vllm on openai, actually validate prompt on streaming, 2023-09-25 22:32:48 -06:00
Cyberes 2d299dbae5 openai_force_no_hashes 2023-09-25 22:01:57 -06:00
Cyberes 8240a1ebbb fix background log not doing anything 2023-09-25 18:18:29 -06:00
Cyberes 8184e24bff fix sending error messages when streaming 2023-09-25 17:37:58 -06:00
Cyberes 7ce60079d7 fix typo 2023-09-25 17:24:51 -06:00
Cyberes 30282479a0 fix flask exception 2023-09-25 17:22:28 -06:00
Cyberes 135bd743bb fix homepage slowness, fix incorrect 24 hr prompters, fix redis wrapper, 2023-09-25 17:20:21 -06:00
Cyberes 52e6965b5e don't count SYSTEM tokens for recent prompters, fix sql exclude for SYSTEM tokens 2023-09-25 13:00:39 -06:00
Cyberes 3eaabc8c35 fix copied code 2023-09-25 12:38:02 -06:00
Cyberes 44e692c9cf remove debug print 2023-09-25 12:35:36 -06:00
Cyberes 1646a00987 implement streaming on openai, improve streaming, run DB logging in background thread 2023-09-25 12:30:40 -06:00
Cyberes bbe5d5a8fe improve openai endpoint, exclude system tokens more places 2023-09-25 09:32:23 -06:00
Cyberes 6459a1c91b allow setting simultaneous IP limit per-token, fix token use tracker, fix tokens on streaming 2023-09-25 00:55:20 -06:00
Cyberes 320f51e01c further align openai endpoint with expected responses 2023-09-24 21:45:30 -06:00
Cyberes 84ea2f8891 handle when auth token is not enabled 2023-09-24 15:57:39 -06:00
Cyberes 8d6b2ce49c minor changes, add admin token auth system, add route to get backend info 2023-09-24 15:54:35 -06:00
Cyberes 2678102153 handle error while streaming 2023-09-24 13:27:27 -06:00
Cyberes cb99c3490e rewrite tokenizer, restructure validation 2023-09-24 13:02:30 -06:00
Cyberes 62412f4873 add config setting for hostname 2023-09-23 23:24:08 -06:00
Cyberes 84a1fcfdd8 don't store host if it's an IP 2023-09-23 23:14:22 -06:00
Cyberes 0015e653b2 adjust a few final things 2023-09-23 22:30:59 -06:00
Cyberes fab7b7ccdd active gen workers wait 2023-09-23 21:17:13 -06:00
Cyberes 7ee2311183 whats going on 2023-09-23 21:10:14 -06:00
Cyberes 94e845cd1a if there's less than num concurrent wait time is 0 2023-09-23 21:09:21 -06:00
Cyberes 41e622d19c fix two exceptions 2023-09-23 20:55:49 -06:00
Cyberes f67ac8175b fix wrong approach for streaming 2023-09-23 18:44:07 -06:00
Cyberes 8a4de7df44 oops 2023-09-23 18:01:12 -06:00
Cyberes 76a1428ba0 implement streaming for vllm 2023-09-23 17:57:23 -06:00
Cyberes f9a80f3028 change proompters 1 min to 5 min 2023-09-20 21:21:22 -06:00
Cyberes 8593198216 close mysql cursor 2023-09-20 21:19:26 -06:00
Cyberes 03e3ec5490 port to mysql, use vllm tokenizer endpoint 2023-09-20 20:30:31 -06:00
Cyberes 2d390e6268 *blushes* oopsie daisy 2023-09-17 20:22:17 -06:00
Cyberes eb3179cfff fix recent proompters to work with gunicorn 2023-09-17 19:06:53 -06:00
Cyberes 3c1254d3bf cache stats in background 2023-09-17 18:55:36 -06:00
Cyberes edf13db324 calculate estimateed wate time better 2023-09-17 18:33:57 -06:00
Cyberes 7434ae1b5b openai: improve moderation checking 2023-09-17 17:40:05 -06:00
Cyberes 354ad8192d fix division by 0, prettify /stats json, add js var to home 2023-09-16 17:37:43 -06:00
Cyberes 77edbe779c actually validate prompt length lol 2023-09-14 18:31:13 -06:00
Cyberes 3100b0a924 set up queue to work with gunicorn processes, other improvements 2023-09-14 17:38:20 -06:00
Cyberes 5d03f875cb adjust prompt 2023-09-14 15:43:04 -06:00
Cyberes 1cf4c95ba2 ah, oops 2023-09-14 15:14:59 -06:00