Commit Graph

49 Commits

Author SHA1 Message Date
Cyberes 32ad97e57c do default model rather than default backend, adjust moderation endpoint logic and add timeout, exclude system tokens from recent proompters, calculate number of moderators from endpoint concurrent gens, adjust homepage 2023-10-03 13:40:08 -06:00
Cyberes 2a3ff7e21e update openai endpoints 2023-10-01 14:15:01 -06:00
Cyberes 25ec56a5ef get streaming working, remove /v2/ 2023-10-01 00:20:00 -06:00
Cyberes e0f86d053a reorganize to api v2 2023-09-30 19:42:41 -06:00
Cyberes 114f36e709 functional 2023-09-30 19:41:50 -06:00
Cyberes 624ca74ce5 mvp 2023-09-29 00:09:44 -06:00
Cyberes e7b57cad7b set up cluster config and basic background workers 2023-09-28 18:40:24 -06:00
Cyberes 347a82b7e1 avoid sending to backend to tokenize if it's greater than our specified context size 2023-09-28 03:54:20 -06:00
Cyberes 59f2aac8ad rewrite redis usage 2023-09-28 03:44:30 -06:00
Cyberes 43299b32ad clean up background threads 2023-09-27 19:39:04 -06:00
Cyberes 35e9847b27 set inference workers to daemon, add finally to inference worker, hide estimated avg tps 2023-09-27 18:36:51 -06:00
Cyberes e0af2ea9c5 convert to gunicorn 2023-09-26 13:32:33 -06:00
Cyberes 7ce60079d7 fix typo 2023-09-25 17:24:51 -06:00
Cyberes 135bd743bb fix homepage slowness, fix incorrect 24 hr prompters, fix redis wrapper, 2023-09-25 17:20:21 -06:00
Cyberes 52e6965b5e don't count SYSTEM tokens for recent prompters, fix sql exclude for SYSTEM tokens 2023-09-25 13:00:39 -06:00
Cyberes 8d6b2ce49c minor changes, add admin token auth system, add route to get backend info 2023-09-24 15:54:35 -06:00
Cyberes fab7b7ccdd active gen workers wait 2023-09-23 21:17:13 -06:00
Cyberes 94e845cd1a if there's less than num concurrent wait time is 0 2023-09-23 21:09:21 -06:00
Cyberes f9a80f3028 change proompters 1 min to 5 min 2023-09-20 21:21:22 -06:00
Cyberes 03e3ec5490 port to mysql, use vllm tokenizer endpoint 2023-09-20 20:30:31 -06:00
Cyberes 2d390e6268 *blushes* oopsie daisy 2023-09-17 20:22:17 -06:00
Cyberes eb3179cfff fix recent proompters to work with gunicorn 2023-09-17 19:06:53 -06:00
Cyberes 3c1254d3bf cache stats in background 2023-09-17 18:55:36 -06:00
Cyberes edf13db324 calculate estimateed wate time better 2023-09-17 18:33:57 -06:00
Cyberes 79b1e01b61 option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup 2023-09-14 14:05:50 -06:00
Cyberes e79b206e1a rename average_tps to estimated_avg_tps 2023-09-14 01:35:25 -06:00
Cyberes 9740df07c7 add openai-compatible backend 2023-09-12 16:40:09 -06:00
Cyberes 1d9f40765e remove text-generation-inference backend 2023-09-12 13:09:47 -06:00
Cyberes 6152b1bb66 fix invalid param error, add manual model name 2023-09-12 10:30:45 -06:00
Cyberes 5dd95875dd oops 2023-09-12 01:12:50 -06:00
Cyberes 40ac84aa9a actually we don't want to emulate openai 2023-09-12 01:04:11 -06:00
Cyberes 4c9d543eab implement vllm backend 2023-09-11 20:47:19 -06:00
Cyberes bf648f605f implement streaming for hf-textgen 2023-08-29 17:56:12 -06:00
Cyberes f9b9051bad update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen 2023-08-29 15:46:56 -06:00
Cyberes ba0bc87434 add HF text-generation-inference backend 2023-08-29 13:46:41 -06:00
Cyberes 6c0e60135d exclude tokens with priority 0 from simultaneous requests ratelimit 2023-08-28 00:03:25 -06:00
Cyberes 1a4cb5f786 reorganize stats page again 2023-08-27 22:24:44 -06:00
Cyberes f43336c92c adjust estimated wait time calculations 2023-08-27 22:17:21 -06:00
Cyberes 6a09ffc8a4 log model used in request so we can pull the correct averages when we change models 2023-08-26 00:30:59 -06:00
Cyberes d64152587c reorganize nvidia stats 2023-08-25 15:02:40 -06:00
Cyberes 839bb115c6 reorganize stats, add 24 hr proompters, adjust logging when error 2023-08-25 12:20:16 -06:00
Cyberes 0230ddda17 dynamically fetch GPUs for netdata 2023-08-24 21:56:15 -06:00
Cyberes 16b986c206 track nvidia power states through netdata 2023-08-24 21:36:00 -06:00
Cyberes 01b8442b95 update current model when we generate_stats() 2023-08-24 21:10:00 -06:00
Cyberes ec3fe2c2ac show total output tokens on stats 2023-08-24 20:43:11 -06:00
Cyberes 9b7bf490a1 sort keys of stats dict 2023-08-24 18:59:52 -06:00
Cyberes 763dd832cc update home, update readme, calculate estimated wait based on database stats 2023-08-24 16:47:14 -06:00
Cyberes 21174750ea update readme 2023-08-24 12:19:59 -06:00
Cyberes f3fe514c11 add home template 2023-08-23 23:11:12 -06:00