local-llm-server

Commit Graph

Author	SHA1	Message	Date
Cyberes	e0f86d053a	reorganize to api v2	2023-09-30 19:42:41 -06:00
Cyberes	114f36e709	functional	2023-09-30 19:41:50 -06:00
Cyberes	624ca74ce5	mvp	2023-09-29 00:09:44 -06:00
Cyberes	e7b57cad7b	set up cluster config and basic background workers	2023-09-28 18:40:24 -06:00
Cyberes	e1d3fca6d3	try to cancel inference if disconnected from client	2023-09-28 09:55:31 -06:00
Cyberes	e42f2b6819	fix negative queue on stats	2023-09-28 08:47:39 -06:00
Cyberes	347a82b7e1	avoid sending to backend to tokenize if it's greater than our specified context size	2023-09-28 03:54:20 -06:00
Cyberes	59f2aac8ad	rewrite redis usage	2023-09-28 03:44:30 -06:00
Cyberes	a4a1d6cce6	fix double logging	2023-09-28 01:34:15 -06:00
Cyberes	ecdf819088	fix try/finally with continue, fix wrong subclass signature	2023-09-28 00:11:34 -06:00
Cyberes	e86a5182eb	redo background processes, reorganize server.py	2023-09-27 23:36:44 -06:00
Cyberes	e5fbc9545d	add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer	2023-09-27 21:15:54 -06:00
Cyberes	43299b32ad	clean up background threads	2023-09-27 19:39:04 -06:00
Cyberes	35e9847b27	set inference workers to daemon, add finally to inference worker, hide estimated avg tps	2023-09-27 18:36:51 -06:00
Cyberes	105b66d5e2	unify error message handling	2023-09-27 14:48:47 -06:00
Cyberes	957a6cd092	fix error handling	2023-09-27 14:36:49 -06:00
Cyberes	aba2e5b9c0	don't use db pooling, add LLM-ST-Errors header to disable formatted errors	2023-09-26 23:59:22 -06:00
Cyberes	048e5a8060	fix API key handling	2023-09-26 22:49:53 -06:00
Cyberes	d9bbcc42e6	more work on openai endpoint	2023-09-26 22:09:11 -06:00
Cyberes	e0af2ea9c5	convert to gunicorn	2023-09-26 13:32:33 -06:00
Cyberes	0eb901cb52	don't log entire request on failure	2023-09-26 12:32:19 -06:00
Cyberes	bbdb9c9d55	try to prevent "### XXX" responses on openai	2023-09-25 23:14:35 -06:00
Cyberes	11e84db59c	update database, tokenizer handle null prompt, convert top_p to vllm on openai, actually validate prompt on streaming,	2023-09-25 22:32:48 -06:00
Cyberes	2d299dbae5	openai_force_no_hashes	2023-09-25 22:01:57 -06:00
Cyberes	8240a1ebbb	fix background log not doing anything	2023-09-25 18:18:29 -06:00
Cyberes	8184e24bff	fix sending error messages when streaming	2023-09-25 17:37:58 -06:00
Cyberes	7ce60079d7	fix typo	2023-09-25 17:24:51 -06:00
Cyberes	30282479a0	fix flask exception	2023-09-25 17:22:28 -06:00
Cyberes	135bd743bb	fix homepage slowness, fix incorrect 24 hr prompters, fix redis wrapper,	2023-09-25 17:20:21 -06:00
Cyberes	52e6965b5e	don't count SYSTEM tokens for recent prompters, fix sql exclude for SYSTEM tokens	2023-09-25 13:00:39 -06:00
Cyberes	3eaabc8c35	fix copied code	2023-09-25 12:38:02 -06:00
Cyberes	44e692c9cf	remove debug print	2023-09-25 12:35:36 -06:00
Cyberes	1646a00987	implement streaming on openai, improve streaming, run DB logging in background thread	2023-09-25 12:30:40 -06:00
Cyberes	bbe5d5a8fe	improve openai endpoint, exclude system tokens more places	2023-09-25 09:32:23 -06:00
Cyberes	6459a1c91b	allow setting simultaneous IP limit per-token, fix token use tracker, fix tokens on streaming	2023-09-25 00:55:20 -06:00
Cyberes	320f51e01c	further align openai endpoint with expected responses	2023-09-24 21:45:30 -06:00
Cyberes	84ea2f8891	handle when auth token is not enabled	2023-09-24 15:57:39 -06:00
Cyberes	8d6b2ce49c	minor changes, add admin token auth system, add route to get backend info	2023-09-24 15:54:35 -06:00
Cyberes	2678102153	handle error while streaming	2023-09-24 13:27:27 -06:00
Cyberes	cb99c3490e	rewrite tokenizer, restructure validation	2023-09-24 13:02:30 -06:00
Cyberes	62412f4873	add config setting for hostname	2023-09-23 23:24:08 -06:00
Cyberes	84a1fcfdd8	don't store host if it's an IP	2023-09-23 23:14:22 -06:00
Cyberes	0015e653b2	adjust a few final things	2023-09-23 22:30:59 -06:00
Cyberes	fab7b7ccdd	active gen workers wait	2023-09-23 21:17:13 -06:00
Cyberes	7ee2311183	whats going on	2023-09-23 21:10:14 -06:00
Cyberes	94e845cd1a	if there's less than num concurrent wait time is 0	2023-09-23 21:09:21 -06:00
Cyberes	41e622d19c	fix two exceptions	2023-09-23 20:55:49 -06:00
Cyberes	f67ac8175b	fix wrong approach for streaming	2023-09-23 18:44:07 -06:00
Cyberes	8a4de7df44	oops	2023-09-23 18:01:12 -06:00
Cyberes	76a1428ba0	implement streaming for vllm	2023-09-23 17:57:23 -06:00
Cyberes	f9a80f3028	change proompters 1 min to 5 min	2023-09-20 21:21:22 -06:00
Cyberes	8593198216	close mysql cursor	2023-09-20 21:19:26 -06:00
Cyberes	03e3ec5490	port to mysql, use vllm tokenizer endpoint	2023-09-20 20:30:31 -06:00
Cyberes	2d390e6268	blushes oopsie daisy	2023-09-17 20:22:17 -06:00
Cyberes	eb3179cfff	fix recent proompters to work with gunicorn	2023-09-17 19:06:53 -06:00
Cyberes	3c1254d3bf	cache stats in background	2023-09-17 18:55:36 -06:00
Cyberes	edf13db324	calculate estimateed wate time better	2023-09-17 18:33:57 -06:00
Cyberes	7434ae1b5b	openai: improve moderation checking	2023-09-17 17:40:05 -06:00
Cyberes	354ad8192d	fix division by 0, prettify /stats json, add js var to home	2023-09-16 17:37:43 -06:00
Cyberes	77edbe779c	actually validate prompt length lol	2023-09-14 18:31:13 -06:00
Cyberes	3100b0a924	set up queue to work with gunicorn processes, other improvements	2023-09-14 17:38:20 -06:00
Cyberes	5d03f875cb	adjust prompt	2023-09-14 15:43:04 -06:00
Cyberes	1cf4c95ba2	ah, oops	2023-09-14 15:14:59 -06:00
Cyberes	a89295193f	add moderation endpoint to openai api, update config	2023-09-14 15:07:17 -06:00
Cyberes	8f4f17166e	adjust	2023-09-14 14:36:22 -06:00
Cyberes	93a344f4c5	check if the backend crapped out, print some more stuff	2023-09-14 14:26:25 -06:00
Cyberes	79b1e01b61	option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup	2023-09-14 14:05:50 -06:00
Cyberes	e79b206e1a	rename average_tps to estimated_avg_tps	2023-09-14 01:35:25 -06:00
Cyberes	12e894032e	show the openai system prompt	2023-09-13 20:25:56 -06:00
Cyberes	3d40ed4cfb	shit code	2023-09-13 11:58:38 -06:00
Cyberes	1582625e09	how did this get broken	2023-09-13 11:56:30 -06:00
Cyberes	05a45e6ac6	didnt test anything	2023-09-13 11:51:46 -06:00
Cyberes	bcedd2ab3d	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
Cyberes	9740df07c7	add openai-compatible backend	2023-09-12 16:40:09 -06:00
Cyberes	1d9f40765e	remove text-generation-inference backend	2023-09-12 13:09:47 -06:00
Cyberes	6152b1bb66	fix invalid param error, add manual model name	2023-09-12 10:30:45 -06:00
Cyberes	5dd95875dd	oops	2023-09-12 01:12:50 -06:00
Cyberes	40ac84aa9a	actually we don't want to emulate openai	2023-09-12 01:04:11 -06:00
Cyberes	747d838138	move where the vllm model is set	2023-09-11 21:05:22 -06:00
Cyberes	4c9d543eab	implement vllm backend	2023-09-11 20:47:19 -06:00
Cyberes	c14cc51f09	get working with ooba again, give up on dockerfile	2023-09-11 09:51:01 -06:00
Cyberes	2d8812a6cd	fix crash again	2023-08-31 09:31:16 -06:00
Cyberes	4b32401542	oops wrong data strucutre	2023-08-30 20:24:55 -06:00
Cyberes	47887c3925	missed a spot, clean up json error handling	2023-08-30 20:19:23 -06:00
Cyberes	8c04238e04	disable stream for now	2023-08-30 19:58:59 -06:00
Cyberes	2816c01902	refactor generation route	2023-08-30 18:53:26 -06:00
Cyberes	bf648f605f	implement streaming for hf-textgen	2023-08-29 17:56:12 -06:00
Cyberes	26b04f364c	remove old code	2023-08-29 15:57:28 -06:00
Cyberes	cef88b866a	fix wrong response status code	2023-08-29 15:52:58 -06:00
Cyberes	f9b9051bad	update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen	2023-08-29 15:46:56 -06:00
Cyberes	2d9ec15302	I swear I know what I'm doing	2023-08-29 14:57:49 -06:00
Cyberes	06b52c7648	forgot to remove a snippet	2023-08-29 14:53:03 -06:00
Cyberes	23f3fcf579	log errors to database	2023-08-29 14:48:33 -06:00
Cyberes	ba0bc87434	add HF text-generation-inference backend	2023-08-29 13:46:41 -06:00
Cyberes	6c0e60135d	exclude tokens with priority 0 from simultaneous requests ratelimit	2023-08-28 00:03:25 -06:00
Cyberes	c16d70a24d	limit amount of simultaneous requests an IP can make	2023-08-27 23:48:10 -06:00
Cyberes	1a4cb5f786	reorganize stats page again	2023-08-27 22:24:44 -06:00
Cyberes	f43336c92c	adjust estimated wait time calculations	2023-08-27 22:17:21 -06:00
Cyberes	6a09ffc8a4	log model used in request so we can pull the correct averages when we change models	2023-08-26 00:30:59 -06:00
Cyberes	d64152587c	reorganize nvidia stats	2023-08-25 15:02:40 -06:00

1 2 3 4 5

205 Commits