Commit Graph

48 Commits

Author SHA1 Message Date
Cyberes ff82add09e redo database connection, add pooling, minor logging changes, other clean up 2024-05-07 09:48:51 -06:00
Cyberes 4b3e0671c6 clean some stuff up, bump VLLM version 2024-01-10 15:01:26 -07:00
Cyberes 0059e7956c Merge cluster to master (#3)
Co-authored-by: Cyberes <cyberes@evulid.cc>
Reviewed-on: #3
2023-10-27 19:19:22 -06:00
Cyberes 347a82b7e1 avoid sending to backend to tokenize if it's greater than our specified context size 2023-09-28 03:54:20 -06:00
Cyberes e5fbc9545d add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer 2023-09-27 21:15:54 -06:00
Cyberes 957a6cd092 fix error handling 2023-09-27 14:36:49 -06:00
Cyberes aba2e5b9c0 don't use db pooling, add LLM-ST-Errors header to disable formatted errors 2023-09-26 23:59:22 -06:00
Cyberes d9bbcc42e6 more work on openai endpoint 2023-09-26 22:09:11 -06:00
Cyberes e0af2ea9c5 convert to gunicorn 2023-09-26 13:32:33 -06:00
Cyberes 11e84db59c update database, tokenizer handle null prompt, convert top_p to vllm on openai, actually validate prompt on streaming, 2023-09-25 22:32:48 -06:00
Cyberes 8240a1ebbb fix background log not doing anything 2023-09-25 18:18:29 -06:00
Cyberes 1646a00987 implement streaming on openai, improve streaming, run DB logging in background thread 2023-09-25 12:30:40 -06:00
Cyberes 320f51e01c further align openai endpoint with expected responses 2023-09-24 21:45:30 -06:00
Cyberes cb99c3490e rewrite tokenizer, restructure validation 2023-09-24 13:02:30 -06:00
Cyberes 76a1428ba0 implement streaming for vllm 2023-09-23 17:57:23 -06:00
Cyberes 81452ec643 adjust vllm info 2023-09-21 20:13:29 -06:00
Cyberes 03e3ec5490 port to mysql, use vllm tokenizer endpoint 2023-09-20 20:30:31 -06:00
Cyberes 354ad8192d fix division by 0, prettify /stats json, add js var to home 2023-09-16 17:37:43 -06:00
Cyberes 77edbe779c actually validate prompt length lol 2023-09-14 18:31:13 -06:00
Cyberes 3100b0a924 set up queue to work with gunicorn processes, other improvements 2023-09-14 17:38:20 -06:00
Cyberes 79b1e01b61 option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup 2023-09-14 14:05:50 -06:00
Cyberes c45e68a8c8 adjust requests timeout, add service file 2023-09-14 01:32:49 -06:00
Cyberes 05a45e6ac6 didnt test anything 2023-09-13 11:51:46 -06:00
Cyberes bcedd2ab3d adjust logging, add more vllm stuff 2023-09-13 11:22:33 -06:00
Cyberes 9740df07c7 add openai-compatible backend 2023-09-12 16:40:09 -06:00
Cyberes 1d9f40765e remove text-generation-inference backend 2023-09-12 13:09:47 -06:00
Cyberes 6152b1bb66 fix invalid param error, add manual model name 2023-09-12 10:30:45 -06:00
Cyberes 40ac84aa9a actually we don't want to emulate openai 2023-09-12 01:04:11 -06:00
Cyberes 747d838138 move where the vllm model is set 2023-09-11 21:05:22 -06:00
Cyberes 4c9d543eab implement vllm backend 2023-09-11 20:47:19 -06:00
Cyberes c14cc51f09 get working with ooba again, give up on dockerfile 2023-09-11 09:51:01 -06:00
Cyberes bf39b8da63 still having issues 2023-08-31 09:24:37 -06:00
Cyberes 47887c3925 missed a spot, clean up json error handling 2023-08-30 20:19:23 -06:00
Cyberes 2816c01902 refactor generation route 2023-08-30 18:53:26 -06:00
Cyberes f9b9051bad update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen 2023-08-29 15:46:56 -06:00
Cyberes 23f3fcf579 log errors to database 2023-08-29 14:48:33 -06:00
Cyberes b44dfa2471 update info page 2023-08-29 14:00:35 -06:00
Cyberes ba0bc87434 add HF text-generation-inference backend 2023-08-29 13:46:41 -06:00
Cyberes 0aa52863bc forgot to start workers 2023-08-23 20:33:49 -06:00
Cyberes 6f8b70df54 add a queue system 2023-08-23 20:12:38 -06:00
Cyberes 7eb930fafd crap 2023-08-23 16:12:25 -06:00
Cyberes 9fc674878d allow disabling ssl verification 2023-08-23 16:11:32 -06:00
Cyberes 508089ce11 model info timeout and additional info 2023-08-23 16:07:43 -06:00
Cyberes 1f5e2da637 print fetch model error message 2023-08-23 16:02:57 -06:00
Cyberes a525093c75 rename, more stats 2023-08-22 20:42:38 -06:00
Cyberes 0d32db2dbd prototype hf-textgen and adjust logging 2023-08-22 19:58:31 -06:00
Cyberes a59dcea2da more proxy stats 2023-08-22 16:50:49 -06:00
Cyberes 8cbf643fd3 MVP 2023-08-21 21:28:52 -06:00