Commit Graph

175 Commits

Author SHA1 Message Date
Cyberes 3eaabc8c35 fix copied code 2023-09-25 12:38:02 -06:00
Cyberes 44e692c9cf remove debug print 2023-09-25 12:35:36 -06:00
Cyberes 1646a00987 implement streaming on openai, improve streaming, run DB logging in background thread 2023-09-25 12:30:40 -06:00
Cyberes bbe5d5a8fe improve openai endpoint, exclude system tokens more places 2023-09-25 09:32:23 -06:00
Cyberes 6459a1c91b allow setting simultaneous IP limit per-token, fix token use tracker, fix tokens on streaming 2023-09-25 00:55:20 -06:00
Cyberes 320f51e01c further align openai endpoint with expected responses 2023-09-24 21:45:30 -06:00
Cyberes 84ea2f8891 handle when auth token is not enabled 2023-09-24 15:57:39 -06:00
Cyberes 8d6b2ce49c minor changes, add admin token auth system, add route to get backend info 2023-09-24 15:54:35 -06:00
Cyberes 2678102153 handle error while streaming 2023-09-24 13:27:27 -06:00
Cyberes cb99c3490e rewrite tokenizer, restructure validation 2023-09-24 13:02:30 -06:00
Cyberes 62412f4873 add config setting for hostname 2023-09-23 23:24:08 -06:00
Cyberes 84a1fcfdd8 don't store host if it's an IP 2023-09-23 23:14:22 -06:00
Cyberes 0015e653b2 adjust a few final things 2023-09-23 22:30:59 -06:00
Cyberes fab7b7ccdd active gen workers wait 2023-09-23 21:17:13 -06:00
Cyberes 7ee2311183 whats going on 2023-09-23 21:10:14 -06:00
Cyberes 94e845cd1a if there's less than num concurrent wait time is 0 2023-09-23 21:09:21 -06:00
Cyberes 41e622d19c fix two exceptions 2023-09-23 20:55:49 -06:00
Cyberes f67ac8175b fix wrong approach for streaming 2023-09-23 18:44:07 -06:00
Cyberes 8a4de7df44 oops 2023-09-23 18:01:12 -06:00
Cyberes 76a1428ba0 implement streaming for vllm 2023-09-23 17:57:23 -06:00
Cyberes f9a80f3028 change proompters 1 min to 5 min 2023-09-20 21:21:22 -06:00
Cyberes 8593198216 close mysql cursor 2023-09-20 21:19:26 -06:00
Cyberes 03e3ec5490 port to mysql, use vllm tokenizer endpoint 2023-09-20 20:30:31 -06:00
Cyberes 2d390e6268 *blushes* oopsie daisy 2023-09-17 20:22:17 -06:00
Cyberes eb3179cfff fix recent proompters to work with gunicorn 2023-09-17 19:06:53 -06:00
Cyberes 3c1254d3bf cache stats in background 2023-09-17 18:55:36 -06:00
Cyberes edf13db324 calculate estimateed wate time better 2023-09-17 18:33:57 -06:00
Cyberes 7434ae1b5b openai: improve moderation checking 2023-09-17 17:40:05 -06:00
Cyberes 354ad8192d fix division by 0, prettify /stats json, add js var to home 2023-09-16 17:37:43 -06:00
Cyberes 77edbe779c actually validate prompt length lol 2023-09-14 18:31:13 -06:00
Cyberes 3100b0a924 set up queue to work with gunicorn processes, other improvements 2023-09-14 17:38:20 -06:00
Cyberes 5d03f875cb adjust prompt 2023-09-14 15:43:04 -06:00
Cyberes 1cf4c95ba2 ah, oops 2023-09-14 15:14:59 -06:00
Cyberes a89295193f add moderation endpoint to openai api, update config 2023-09-14 15:07:17 -06:00
Cyberes 8f4f17166e adjust 2023-09-14 14:36:22 -06:00
Cyberes 93a344f4c5 check if the backend crapped out, print some more stuff 2023-09-14 14:26:25 -06:00
Cyberes 79b1e01b61 option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup 2023-09-14 14:05:50 -06:00
Cyberes e79b206e1a rename average_tps to estimated_avg_tps 2023-09-14 01:35:25 -06:00
Cyberes 12e894032e show the openai system prompt 2023-09-13 20:25:56 -06:00
Cyberes 3d40ed4cfb shit code 2023-09-13 11:58:38 -06:00
Cyberes 1582625e09 how did this get broken 2023-09-13 11:56:30 -06:00
Cyberes 05a45e6ac6 didnt test anything 2023-09-13 11:51:46 -06:00
Cyberes bcedd2ab3d adjust logging, add more vllm stuff 2023-09-13 11:22:33 -06:00
Cyberes 9740df07c7 add openai-compatible backend 2023-09-12 16:40:09 -06:00
Cyberes 1d9f40765e remove text-generation-inference backend 2023-09-12 13:09:47 -06:00
Cyberes 6152b1bb66 fix invalid param error, add manual model name 2023-09-12 10:30:45 -06:00
Cyberes 5dd95875dd oops 2023-09-12 01:12:50 -06:00
Cyberes 40ac84aa9a actually we don't want to emulate openai 2023-09-12 01:04:11 -06:00
Cyberes 747d838138 move where the vllm model is set 2023-09-11 21:05:22 -06:00
Cyberes 4c9d543eab implement vllm backend 2023-09-11 20:47:19 -06:00
Cyberes c14cc51f09 get working with ooba again, give up on dockerfile 2023-09-11 09:51:01 -06:00
Cyberes 2d8812a6cd fix crash again 2023-08-31 09:31:16 -06:00
Cyberes 4b32401542 oops wrong data strucutre 2023-08-30 20:24:55 -06:00
Cyberes 47887c3925 missed a spot, clean up json error handling 2023-08-30 20:19:23 -06:00
Cyberes 8c04238e04 disable stream for now 2023-08-30 19:58:59 -06:00
Cyberes 2816c01902 refactor generation route 2023-08-30 18:53:26 -06:00
Cyberes bf648f605f implement streaming for hf-textgen 2023-08-29 17:56:12 -06:00
Cyberes 26b04f364c remove old code 2023-08-29 15:57:28 -06:00
Cyberes cef88b866a fix wrong response status code 2023-08-29 15:52:58 -06:00
Cyberes f9b9051bad update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen 2023-08-29 15:46:56 -06:00
Cyberes 2d9ec15302 I swear I know what I'm doing 2023-08-29 14:57:49 -06:00
Cyberes 06b52c7648 forgot to remove a snippet 2023-08-29 14:53:03 -06:00
Cyberes 23f3fcf579 log errors to database 2023-08-29 14:48:33 -06:00
Cyberes ba0bc87434 add HF text-generation-inference backend 2023-08-29 13:46:41 -06:00
Cyberes 6c0e60135d exclude tokens with priority 0 from simultaneous requests ratelimit 2023-08-28 00:03:25 -06:00
Cyberes c16d70a24d limit amount of simultaneous requests an IP can make 2023-08-27 23:48:10 -06:00
Cyberes 1a4cb5f786 reorganize stats page again 2023-08-27 22:24:44 -06:00
Cyberes f43336c92c adjust estimated wait time calculations 2023-08-27 22:17:21 -06:00
Cyberes 6a09ffc8a4 log model used in request so we can pull the correct averages when we change models 2023-08-26 00:30:59 -06:00
Cyberes d64152587c reorganize nvidia stats 2023-08-25 15:02:40 -06:00
Cyberes 0e6aadf5e1 fix missing empty strings logged when errors 2023-08-25 13:44:41 -06:00
Cyberes 839bb115c6 reorganize stats, add 24 hr proompters, adjust logging when error 2023-08-25 12:20:16 -06:00
Cyberes 26a0a13aa7 actually we want this 2023-08-24 23:57:46 -06:00
Cyberes 0b4da89de2 fix exception 2023-08-24 23:57:25 -06:00
Cyberes 25e3255c9b fix issue with tokenizer 2023-08-24 23:13:07 -06:00
Cyberes 77fe1e237e also handle when no response 2023-08-24 22:53:54 -06:00
Cyberes e5aca7b09d adjust netdata json, don't log error messages during generationg 2023-08-24 22:53:06 -06:00
Cyberes 0230ddda17 dynamically fetch GPUs for netdata 2023-08-24 21:56:15 -06:00
Cyberes 16b986c206 track nvidia power states through netdata 2023-08-24 21:36:00 -06:00
Cyberes 01b8442b95 update current model when we generate_stats() 2023-08-24 21:10:00 -06:00
Cyberes ec3fe2c2ac show total output tokens on stats 2023-08-24 20:43:11 -06:00
Cyberes 9b7bf490a1 sort keys of stats dict 2023-08-24 18:59:52 -06:00
Cyberes 763dd832cc update home, update readme, calculate estimated wait based on database stats 2023-08-24 16:47:14 -06:00
Cyberes 21174750ea update readme 2023-08-24 12:19:59 -06:00
Cyberes afc138c743 update readme 2023-08-24 00:09:57 -06:00
Cyberes f3fe514c11 add home template 2023-08-23 23:11:12 -06:00
Cyberes cdda2c840c dont test code, don't care 2023-08-23 22:24:32 -06:00
Cyberes 1eb8e885d0 am dumb 2023-08-23 22:22:38 -06:00
Cyberes e52acb03a4 log gen time to DB, also keep generation_elapsed under 3 min 2023-08-23 22:20:39 -06:00
Cyberes 3317bd5f1a allow hiding of more variables 2023-08-23 22:08:10 -06:00
Cyberes 11a0b6541f fix some stuff related to gunicorn workers 2023-08-23 22:01:06 -06:00
Cyberes 02c07bbd53 pycarm deeleted import 2023-08-23 21:34:27 -06:00
Cyberes de19af900f add estimated wait time and other time tracking stats 2023-08-23 21:33:52 -06:00
Cyberes 0aa52863bc forgot to start workers 2023-08-23 20:33:49 -06:00
Cyberes 6f8b70df54 add a queue system 2023-08-23 20:12:38 -06:00
Cyberes a79d67adbb do caching ourself on /model 2023-08-23 16:40:20 -06:00
Cyberes 64e1b1654f more cloudflare finicky stuff 2023-08-23 16:32:13 -06:00
Cyberes f76d7bbc5d more caching stuff 2023-08-23 16:23:24 -06:00
Cyberes a6b0bb0183 actually we want 500 2023-08-23 16:09:36 -06:00
Cyberes fd5796ed07 oops 2023-08-23 16:08:52 -06:00