local-llm-server

Commit Graph

Author	SHA1	Message	Date
Cyberes	1670594908	fix import error	2023-10-04 16:29:19 -06:00
Cyberes	6dc3529190	show online status on stats page	2023-10-03 23:39:25 -06:00
Cyberes	1a7f22ec55	adjust again	2023-10-03 20:47:37 -06:00
Cyberes	67f5df9bb9	fix stats page	2023-10-03 20:42:53 -06:00
Cyberes	32ad97e57c	do default model rather than default backend, adjust moderation endpoint logic and add timeout, exclude system tokens from recent proompters, calculate number of moderators from endpoint concurrent gens, adjust homepage	2023-10-03 13:40:08 -06:00
Cyberes	2a3ff7e21e	update openai endpoints	2023-10-01 14:15:01 -06:00
Cyberes	25ec56a5ef	get streaming working, remove /v2/	2023-10-01 00:20:00 -06:00
Cyberes	e0f86d053a	reorganize to api v2	2023-09-30 19:42:41 -06:00
Cyberes	114f36e709	functional	2023-09-30 19:41:50 -06:00
Cyberes	624ca74ce5	mvp	2023-09-29 00:09:44 -06:00
Cyberes	e7b57cad7b	set up cluster config and basic background workers	2023-09-28 18:40:24 -06:00
Cyberes	347a82b7e1	avoid sending to backend to tokenize if it's greater than our specified context size	2023-09-28 03:54:20 -06:00
Cyberes	59f2aac8ad	rewrite redis usage	2023-09-28 03:44:30 -06:00
Cyberes	43299b32ad	clean up background threads	2023-09-27 19:39:04 -06:00
Cyberes	35e9847b27	set inference workers to daemon, add finally to inference worker, hide estimated avg tps	2023-09-27 18:36:51 -06:00
Cyberes	e0af2ea9c5	convert to gunicorn	2023-09-26 13:32:33 -06:00
Cyberes	7ce60079d7	fix typo	2023-09-25 17:24:51 -06:00
Cyberes	135bd743bb	fix homepage slowness, fix incorrect 24 hr prompters, fix redis wrapper,	2023-09-25 17:20:21 -06:00
Cyberes	52e6965b5e	don't count SYSTEM tokens for recent prompters, fix sql exclude for SYSTEM tokens	2023-09-25 13:00:39 -06:00
Cyberes	8d6b2ce49c	minor changes, add admin token auth system, add route to get backend info	2023-09-24 15:54:35 -06:00
Cyberes	fab7b7ccdd	active gen workers wait	2023-09-23 21:17:13 -06:00
Cyberes	94e845cd1a	if there's less than num concurrent wait time is 0	2023-09-23 21:09:21 -06:00
Cyberes	f9a80f3028	change proompters 1 min to 5 min	2023-09-20 21:21:22 -06:00
Cyberes	03e3ec5490	port to mysql, use vllm tokenizer endpoint	2023-09-20 20:30:31 -06:00
Cyberes	2d390e6268	blushes oopsie daisy	2023-09-17 20:22:17 -06:00
Cyberes	eb3179cfff	fix recent proompters to work with gunicorn	2023-09-17 19:06:53 -06:00
Cyberes	3c1254d3bf	cache stats in background	2023-09-17 18:55:36 -06:00
Cyberes	edf13db324	calculate estimateed wate time better	2023-09-17 18:33:57 -06:00
Cyberes	79b1e01b61	option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup	2023-09-14 14:05:50 -06:00
Cyberes	e79b206e1a	rename average_tps to estimated_avg_tps	2023-09-14 01:35:25 -06:00
Cyberes	9740df07c7	add openai-compatible backend	2023-09-12 16:40:09 -06:00
Cyberes	1d9f40765e	remove text-generation-inference backend	2023-09-12 13:09:47 -06:00
Cyberes	6152b1bb66	fix invalid param error, add manual model name	2023-09-12 10:30:45 -06:00
Cyberes	5dd95875dd	oops	2023-09-12 01:12:50 -06:00
Cyberes	40ac84aa9a	actually we don't want to emulate openai	2023-09-12 01:04:11 -06:00
Cyberes	4c9d543eab	implement vllm backend	2023-09-11 20:47:19 -06:00
Cyberes	bf648f605f	implement streaming for hf-textgen	2023-08-29 17:56:12 -06:00
Cyberes	f9b9051bad	update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen	2023-08-29 15:46:56 -06:00
Cyberes	ba0bc87434	add HF text-generation-inference backend	2023-08-29 13:46:41 -06:00
Cyberes	6c0e60135d	exclude tokens with priority 0 from simultaneous requests ratelimit	2023-08-28 00:03:25 -06:00
Cyberes	1a4cb5f786	reorganize stats page again	2023-08-27 22:24:44 -06:00
Cyberes	f43336c92c	adjust estimated wait time calculations	2023-08-27 22:17:21 -06:00
Cyberes	6a09ffc8a4	log model used in request so we can pull the correct averages when we change models	2023-08-26 00:30:59 -06:00
Cyberes	d64152587c	reorganize nvidia stats	2023-08-25 15:02:40 -06:00
Cyberes	839bb115c6	reorganize stats, add 24 hr proompters, adjust logging when error	2023-08-25 12:20:16 -06:00
Cyberes	0230ddda17	dynamically fetch GPUs for netdata	2023-08-24 21:56:15 -06:00
Cyberes	16b986c206	track nvidia power states through netdata	2023-08-24 21:36:00 -06:00
Cyberes	01b8442b95	update current model when we generate_stats()	2023-08-24 21:10:00 -06:00
Cyberes	ec3fe2c2ac	show total output tokens on stats	2023-08-24 20:43:11 -06:00
Cyberes	9b7bf490a1	sort keys of stats dict	2023-08-24 18:59:52 -06:00

1 2

53 Commits