local-llm-server

Commit Graph

Author	SHA1	Message	Date
Cyberes	9a1d41a9b7	get functional again	2024-07-07 15:05:35 -06:00
Cyberes	20366fbd08	misc adjustments	2024-05-07 22:56:36 -06:00
Cyberes	fe23a2282f	refactor, add Llm-Disable-Openai header	2024-05-07 17:41:53 -06:00
Cyberes	5bd1044fad	openai error message cleanup	2024-05-07 17:07:34 -06:00
Cyberes	fd09c783d3	refactor a lot of things, major cleanup, use postgresql	2024-05-07 17:03:41 -06:00
Cyberes	ee9a0d4858	redo config	2024-05-07 12:20:53 -06:00
Cyberes	ff82add09e	redo database connection, add pooling, minor logging changes, other clean up	2024-05-07 09:48:51 -06:00
Cyberes	4b3e0671c6	clean some stuff up, bump VLLM version	2024-01-10 15:01:26 -07:00
Cyberes	0059e7956c	Merge cluster to master (#3 ) Co-authored-by: Cyberes <cyberes@evulid.cc> Reviewed-on: #3	2023-10-27 19:19:22 -06:00
Cyberes	347a82b7e1	avoid sending to backend to tokenize if it's greater than our specified context size	2023-09-28 03:54:20 -06:00
Cyberes	e5fbc9545d	add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer	2023-09-27 21:15:54 -06:00
Cyberes	957a6cd092	fix error handling	2023-09-27 14:36:49 -06:00
Cyberes	aba2e5b9c0	don't use db pooling, add LLM-ST-Errors header to disable formatted errors	2023-09-26 23:59:22 -06:00
Cyberes	d9bbcc42e6	more work on openai endpoint	2023-09-26 22:09:11 -06:00
Cyberes	e0af2ea9c5	convert to gunicorn	2023-09-26 13:32:33 -06:00
Cyberes	11e84db59c	update database, tokenizer handle null prompt, convert top_p to vllm on openai, actually validate prompt on streaming,	2023-09-25 22:32:48 -06:00
Cyberes	8240a1ebbb	fix background log not doing anything	2023-09-25 18:18:29 -06:00
Cyberes	1646a00987	implement streaming on openai, improve streaming, run DB logging in background thread	2023-09-25 12:30:40 -06:00
Cyberes	320f51e01c	further align openai endpoint with expected responses	2023-09-24 21:45:30 -06:00
Cyberes	cb99c3490e	rewrite tokenizer, restructure validation	2023-09-24 13:02:30 -06:00
Cyberes	76a1428ba0	implement streaming for vllm	2023-09-23 17:57:23 -06:00
Cyberes	81452ec643	adjust vllm info	2023-09-21 20:13:29 -06:00
Cyberes	03e3ec5490	port to mysql, use vllm tokenizer endpoint	2023-09-20 20:30:31 -06:00
Cyberes	354ad8192d	fix division by 0, prettify /stats json, add js var to home	2023-09-16 17:37:43 -06:00
Cyberes	77edbe779c	actually validate prompt length lol	2023-09-14 18:31:13 -06:00
Cyberes	3100b0a924	set up queue to work with gunicorn processes, other improvements	2023-09-14 17:38:20 -06:00
Cyberes	79b1e01b61	option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup	2023-09-14 14:05:50 -06:00
Cyberes	c45e68a8c8	adjust requests timeout, add service file	2023-09-14 01:32:49 -06:00
Cyberes	05a45e6ac6	didnt test anything	2023-09-13 11:51:46 -06:00
Cyberes	bcedd2ab3d	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
Cyberes	9740df07c7	add openai-compatible backend	2023-09-12 16:40:09 -06:00
Cyberes	1d9f40765e	remove text-generation-inference backend	2023-09-12 13:09:47 -06:00
Cyberes	6152b1bb66	fix invalid param error, add manual model name	2023-09-12 10:30:45 -06:00
Cyberes	40ac84aa9a	actually we don't want to emulate openai	2023-09-12 01:04:11 -06:00
Cyberes	747d838138	move where the vllm model is set	2023-09-11 21:05:22 -06:00
Cyberes	4c9d543eab	implement vllm backend	2023-09-11 20:47:19 -06:00
Cyberes	c14cc51f09	get working with ooba again, give up on dockerfile	2023-09-11 09:51:01 -06:00
Cyberes	bf39b8da63	still having issues	2023-08-31 09:24:37 -06:00
Cyberes	47887c3925	missed a spot, clean up json error handling	2023-08-30 20:19:23 -06:00
Cyberes	2816c01902	refactor generation route	2023-08-30 18:53:26 -06:00
Cyberes	f9b9051bad	update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen	2023-08-29 15:46:56 -06:00
Cyberes	23f3fcf579	log errors to database	2023-08-29 14:48:33 -06:00
Cyberes	b44dfa2471	update info page	2023-08-29 14:00:35 -06:00
Cyberes	ba0bc87434	add HF text-generation-inference backend	2023-08-29 13:46:41 -06:00
Cyberes	0aa52863bc	forgot to start workers	2023-08-23 20:33:49 -06:00
Cyberes	6f8b70df54	add a queue system	2023-08-23 20:12:38 -06:00
Cyberes	7eb930fafd	crap	2023-08-23 16:12:25 -06:00
Cyberes	9fc674878d	allow disabling ssl verification	2023-08-23 16:11:32 -06:00
Cyberes	508089ce11	model info timeout and additional info	2023-08-23 16:07:43 -06:00
Cyberes	1f5e2da637	print fetch model error message	2023-08-23 16:02:57 -06:00

1 2

54 Commits