Cyberes
|
32ad97e57c
|
do default model rather than default backend, adjust moderation endpoint logic and add timeout, exclude system tokens from recent proompters, calculate number of moderators from endpoint concurrent gens, adjust homepage
|
2023-10-03 13:40:08 -06:00 |
Cyberes
|
2a3ff7e21e
|
update openai endpoints
|
2023-10-01 14:15:01 -06:00 |
Cyberes
|
25ec56a5ef
|
get streaming working, remove /v2/
|
2023-10-01 00:20:00 -06:00 |
Cyberes
|
e0f86d053a
|
reorganize to api v2
|
2023-09-30 19:42:41 -06:00 |
Cyberes
|
114f36e709
|
functional
|
2023-09-30 19:41:50 -06:00 |
Cyberes
|
624ca74ce5
|
mvp
|
2023-09-29 00:09:44 -06:00 |
Cyberes
|
e7b57cad7b
|
set up cluster config and basic background workers
|
2023-09-28 18:40:24 -06:00 |
Cyberes
|
347a82b7e1
|
avoid sending to backend to tokenize if it's greater than our specified context size
|
2023-09-28 03:54:20 -06:00 |
Cyberes
|
59f2aac8ad
|
rewrite redis usage
|
2023-09-28 03:44:30 -06:00 |
Cyberes
|
43299b32ad
|
clean up background threads
|
2023-09-27 19:39:04 -06:00 |
Cyberes
|
35e9847b27
|
set inference workers to daemon, add finally to inference worker, hide estimated avg tps
|
2023-09-27 18:36:51 -06:00 |
Cyberes
|
e0af2ea9c5
|
convert to gunicorn
|
2023-09-26 13:32:33 -06:00 |
Cyberes
|
7ce60079d7
|
fix typo
|
2023-09-25 17:24:51 -06:00 |
Cyberes
|
135bd743bb
|
fix homepage slowness, fix incorrect 24 hr prompters, fix redis wrapper,
|
2023-09-25 17:20:21 -06:00 |
Cyberes
|
52e6965b5e
|
don't count SYSTEM tokens for recent prompters, fix sql exclude for SYSTEM tokens
|
2023-09-25 13:00:39 -06:00 |
Cyberes
|
8d6b2ce49c
|
minor changes, add admin token auth system, add route to get backend info
|
2023-09-24 15:54:35 -06:00 |
Cyberes
|
fab7b7ccdd
|
active gen workers wait
|
2023-09-23 21:17:13 -06:00 |
Cyberes
|
94e845cd1a
|
if there's less than num concurrent wait time is 0
|
2023-09-23 21:09:21 -06:00 |
Cyberes
|
f9a80f3028
|
change proompters 1 min to 5 min
|
2023-09-20 21:21:22 -06:00 |
Cyberes
|
03e3ec5490
|
port to mysql, use vllm tokenizer endpoint
|
2023-09-20 20:30:31 -06:00 |
Cyberes
|
2d390e6268
|
*blushes* oopsie daisy
|
2023-09-17 20:22:17 -06:00 |
Cyberes
|
eb3179cfff
|
fix recent proompters to work with gunicorn
|
2023-09-17 19:06:53 -06:00 |
Cyberes
|
3c1254d3bf
|
cache stats in background
|
2023-09-17 18:55:36 -06:00 |
Cyberes
|
edf13db324
|
calculate estimateed wate time better
|
2023-09-17 18:33:57 -06:00 |
Cyberes
|
79b1e01b61
|
option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup
|
2023-09-14 14:05:50 -06:00 |
Cyberes
|
e79b206e1a
|
rename average_tps to estimated_avg_tps
|
2023-09-14 01:35:25 -06:00 |
Cyberes
|
9740df07c7
|
add openai-compatible backend
|
2023-09-12 16:40:09 -06:00 |
Cyberes
|
1d9f40765e
|
remove text-generation-inference backend
|
2023-09-12 13:09:47 -06:00 |
Cyberes
|
6152b1bb66
|
fix invalid param error, add manual model name
|
2023-09-12 10:30:45 -06:00 |
Cyberes
|
5dd95875dd
|
oops
|
2023-09-12 01:12:50 -06:00 |
Cyberes
|
40ac84aa9a
|
actually we don't want to emulate openai
|
2023-09-12 01:04:11 -06:00 |
Cyberes
|
4c9d543eab
|
implement vllm backend
|
2023-09-11 20:47:19 -06:00 |
Cyberes
|
bf648f605f
|
implement streaming for hf-textgen
|
2023-08-29 17:56:12 -06:00 |
Cyberes
|
f9b9051bad
|
update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen
|
2023-08-29 15:46:56 -06:00 |
Cyberes
|
ba0bc87434
|
add HF text-generation-inference backend
|
2023-08-29 13:46:41 -06:00 |
Cyberes
|
6c0e60135d
|
exclude tokens with priority 0 from simultaneous requests ratelimit
|
2023-08-28 00:03:25 -06:00 |
Cyberes
|
1a4cb5f786
|
reorganize stats page again
|
2023-08-27 22:24:44 -06:00 |
Cyberes
|
f43336c92c
|
adjust estimated wait time calculations
|
2023-08-27 22:17:21 -06:00 |
Cyberes
|
6a09ffc8a4
|
log model used in request so we can pull the correct averages when we change models
|
2023-08-26 00:30:59 -06:00 |
Cyberes
|
d64152587c
|
reorganize nvidia stats
|
2023-08-25 15:02:40 -06:00 |
Cyberes
|
839bb115c6
|
reorganize stats, add 24 hr proompters, adjust logging when error
|
2023-08-25 12:20:16 -06:00 |
Cyberes
|
0230ddda17
|
dynamically fetch GPUs for netdata
|
2023-08-24 21:56:15 -06:00 |
Cyberes
|
16b986c206
|
track nvidia power states through netdata
|
2023-08-24 21:36:00 -06:00 |
Cyberes
|
01b8442b95
|
update current model when we generate_stats()
|
2023-08-24 21:10:00 -06:00 |
Cyberes
|
ec3fe2c2ac
|
show total output tokens on stats
|
2023-08-24 20:43:11 -06:00 |
Cyberes
|
9b7bf490a1
|
sort keys of stats dict
|
2023-08-24 18:59:52 -06:00 |
Cyberes
|
763dd832cc
|
update home, update readme, calculate estimated wait based on database stats
|
2023-08-24 16:47:14 -06:00 |
Cyberes
|
21174750ea
|
update readme
|
2023-08-24 12:19:59 -06:00 |
Cyberes
|
f3fe514c11
|
add home template
|
2023-08-23 23:11:12 -06:00 |