local-llm-server

Commit Graph

Author	SHA1	Message	Date
Cyberes	fab7b7ccdd	active gen workers wait	2023-09-23 21:17:13 -06:00
Cyberes	7ee2311183	whats going on	2023-09-23 21:10:14 -06:00
Cyberes	94e845cd1a	if there's less than num concurrent wait time is 0	2023-09-23 21:09:21 -06:00
Cyberes	41e622d19c	fix two exceptions	2023-09-23 20:55:49 -06:00
Cyberes	f67ac8175b	fix wrong approach for streaming	2023-09-23 18:44:07 -06:00
Cyberes	8a4de7df44	oops	2023-09-23 18:01:12 -06:00
Cyberes	76a1428ba0	implement streaming for vllm	2023-09-23 17:57:23 -06:00
Cyberes	f9a80f3028	change proompters 1 min to 5 min	2023-09-20 21:21:22 -06:00
Cyberes	03e3ec5490	port to mysql, use vllm tokenizer endpoint	2023-09-20 20:30:31 -06:00
Cyberes	2d390e6268	blushes oopsie daisy	2023-09-17 20:22:17 -06:00
Cyberes	eb3179cfff	fix recent proompters to work with gunicorn	2023-09-17 19:06:53 -06:00
Cyberes	3c1254d3bf	cache stats in background	2023-09-17 18:55:36 -06:00
Cyberes	edf13db324	calculate estimateed wate time better	2023-09-17 18:33:57 -06:00
Cyberes	354ad8192d	fix division by 0, prettify /stats json, add js var to home	2023-09-16 17:37:43 -06:00
Cyberes	a89295193f	add moderation endpoint to openai api, update config	2023-09-14 15:07:17 -06:00
Cyberes	8f4f17166e	adjust	2023-09-14 14:36:22 -06:00
Cyberes	93a344f4c5	check if the backend crapped out, print some more stuff	2023-09-14 14:26:25 -06:00
Cyberes	79b1e01b61	option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup	2023-09-14 14:05:50 -06:00
Cyberes	e79b206e1a	rename average_tps to estimated_avg_tps	2023-09-14 01:35:25 -06:00
Cyberes	12e894032e	show the openai system prompt	2023-09-13 20:25:56 -06:00
Cyberes	9740df07c7	add openai-compatible backend	2023-09-12 16:40:09 -06:00
Cyberes	1d9f40765e	remove text-generation-inference backend	2023-09-12 13:09:47 -06:00
Cyberes	6152b1bb66	fix invalid param error, add manual model name	2023-09-12 10:30:45 -06:00
Cyberes	5dd95875dd	oops	2023-09-12 01:12:50 -06:00
Cyberes	40ac84aa9a	actually we don't want to emulate openai	2023-09-12 01:04:11 -06:00
Cyberes	4c9d543eab	implement vllm backend	2023-09-11 20:47:19 -06:00
Cyberes	4b32401542	oops wrong data strucutre	2023-08-30 20:24:55 -06:00
Cyberes	47887c3925	missed a spot, clean up json error handling	2023-08-30 20:19:23 -06:00
Cyberes	8c04238e04	disable stream for now	2023-08-30 19:58:59 -06:00
Cyberes	2816c01902	refactor generation route	2023-08-30 18:53:26 -06:00
Cyberes	bf648f605f	implement streaming for hf-textgen	2023-08-29 17:56:12 -06:00
Cyberes	26b04f364c	remove old code	2023-08-29 15:57:28 -06:00
Cyberes	cef88b866a	fix wrong response status code	2023-08-29 15:52:58 -06:00
Cyberes	f9b9051bad	update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen	2023-08-29 15:46:56 -06:00
Cyberes	2d9ec15302	I swear I know what I'm doing	2023-08-29 14:57:49 -06:00
Cyberes	06b52c7648	forgot to remove a snippet	2023-08-29 14:53:03 -06:00
Cyberes	23f3fcf579	log errors to database	2023-08-29 14:48:33 -06:00
Cyberes	ba0bc87434	add HF text-generation-inference backend	2023-08-29 13:46:41 -06:00
Cyberes	6c0e60135d	exclude tokens with priority 0 from simultaneous requests ratelimit	2023-08-28 00:03:25 -06:00
Cyberes	c16d70a24d	limit amount of simultaneous requests an IP can make	2023-08-27 23:48:10 -06:00
Cyberes	1a4cb5f786	reorganize stats page again	2023-08-27 22:24:44 -06:00
Cyberes	f43336c92c	adjust estimated wait time calculations	2023-08-27 22:17:21 -06:00
Cyberes	6a09ffc8a4	log model used in request so we can pull the correct averages when we change models	2023-08-26 00:30:59 -06:00
Cyberes	d64152587c	reorganize nvidia stats	2023-08-25 15:02:40 -06:00
Cyberes	0e6aadf5e1	fix missing empty strings logged when errors	2023-08-25 13:44:41 -06:00
Cyberes	839bb115c6	reorganize stats, add 24 hr proompters, adjust logging when error	2023-08-25 12:20:16 -06:00
Cyberes	26a0a13aa7	actually we want this	2023-08-24 23:57:46 -06:00
Cyberes	0b4da89de2	fix exception	2023-08-24 23:57:25 -06:00
Cyberes	25e3255c9b	fix issue with tokenizer	2023-08-24 23:13:07 -06:00
Cyberes	77fe1e237e	also handle when no response	2023-08-24 22:53:54 -06:00
Cyberes	e5aca7b09d	adjust netdata json, don't log error messages during generationg	2023-08-24 22:53:06 -06:00
Cyberes	0230ddda17	dynamically fetch GPUs for netdata	2023-08-24 21:56:15 -06:00
Cyberes	16b986c206	track nvidia power states through netdata	2023-08-24 21:36:00 -06:00
Cyberes	01b8442b95	update current model when we generate_stats()	2023-08-24 21:10:00 -06:00
Cyberes	ec3fe2c2ac	show total output tokens on stats	2023-08-24 20:43:11 -06:00
Cyberes	9b7bf490a1	sort keys of stats dict	2023-08-24 18:59:52 -06:00
Cyberes	763dd832cc	update home, update readme, calculate estimated wait based on database stats	2023-08-24 16:47:14 -06:00
Cyberes	21174750ea	update readme	2023-08-24 12:19:59 -06:00
Cyberes	afc138c743	update readme	2023-08-24 00:09:57 -06:00
Cyberes	f3fe514c11	add home template	2023-08-23 23:11:12 -06:00
Cyberes	cdda2c840c	dont test code, don't care	2023-08-23 22:24:32 -06:00
Cyberes	1eb8e885d0	am dumb	2023-08-23 22:22:38 -06:00
Cyberes	e52acb03a4	log gen time to DB, also keep generation_elapsed under 3 min	2023-08-23 22:20:39 -06:00
Cyberes	3317bd5f1a	allow hiding of more variables	2023-08-23 22:08:10 -06:00
Cyberes	11a0b6541f	fix some stuff related to gunicorn workers	2023-08-23 22:01:06 -06:00
Cyberes	02c07bbd53	pycarm deeleted import	2023-08-23 21:34:27 -06:00
Cyberes	de19af900f	add estimated wait time and other time tracking stats	2023-08-23 21:33:52 -06:00
Cyberes	6f8b70df54	add a queue system	2023-08-23 20:12:38 -06:00
Cyberes	a79d67adbb	do caching ourself on /model	2023-08-23 16:40:20 -06:00
Cyberes	64e1b1654f	more cloudflare finicky stuff	2023-08-23 16:32:13 -06:00
Cyberes	f76d7bbc5d	more caching stuff	2023-08-23 16:23:24 -06:00
Cyberes	a6b0bb0183	actually we want 500	2023-08-23 16:09:36 -06:00
Cyberes	fd5796ed07	oops	2023-08-23 16:08:52 -06:00
Cyberes	508089ce11	model info timeout and additional info	2023-08-23 16:07:43 -06:00
Cyberes	1f5e2da637	print fetch model error message	2023-08-23 16:02:57 -06:00
Cyberes	806073ee4c	update config	2023-08-23 15:23:06 -06:00
Cyberes	ba063f7f1b	caching	2023-08-23 12:40:13 -06:00
Cyberes	33190e3cfe	fix stats for real	2023-08-23 01:14:19 -06:00
Cyberes	3bb27d6900	track IPs for last min proompters	2023-08-22 23:37:39 -06:00
Cyberes	61b9e313d2	cache again	2023-08-22 23:14:56 -06:00
Cyberes	36b793e8a2	fix proompters_1_min again	2023-08-22 23:01:09 -06:00
Cyberes	b051f8dd6b	remove caching on stats route	2023-08-22 22:42:40 -06:00
Cyberes	9f14b166dd	fix proompters_1_min, other minor changes	2023-08-22 22:32:29 -06:00
Cyberes	06ae8adf0d	add backend name to error messages	2023-08-22 21:14:12 -06:00
Cyberes	a525093c75	rename, more stats	2023-08-22 20:42:38 -06:00
Cyberes	a9b7a7a2c7	display error messages in sillytavern	2023-08-22 20:28:41 -06:00
Cyberes	0d32db2dbd	prototype hf-textgen and adjust logging	2023-08-22 19:58:31 -06:00
Cyberes	a59dcea2da	more proxy stats	2023-08-22 16:50:49 -06:00
Cyberes	d8d5514aea	add mode to stats	2023-08-22 16:41:55 -06:00
Cyberes	ad9a91f1b5	concurrent gens setting, online status	2023-08-22 00:26:46 -06:00
Cyberes	f767e4b076	stats: prompters 1 min	2023-08-21 23:48:06 -06:00
Cyberes	e04d6a8a13	minor adjustments	2023-08-21 22:49:44 -06:00
Cyberes	8cbf643fd3	MVP	2023-08-21 21:28:52 -06:00

1 2 3

143 Commits