local-llm-server

Commit Graph

Author	SHA1	Message	Date
Cyberes	ee9a0d4858	redo config	2024-05-07 12:20:53 -06:00
Cyberes	ff82add09e	redo database connection, add pooling, minor logging changes, other clean up	2024-05-07 09:48:51 -06:00
Cyberes	4b3e0671c6	clean some stuff up, bump VLLM version	2024-01-10 15:01:26 -07:00
Cyberes	0059e7956c	Merge cluster to master (#3 ) Co-authored-by: Cyberes <cyberes@evulid.cc> Reviewed-on: #3	2023-10-27 19:19:22 -06:00
Cyberes	59f2aac8ad	rewrite redis usage	2023-09-28 03:44:30 -06:00
Cyberes	a4a1d6cce6	fix double logging	2023-09-28 01:34:15 -06:00
Cyberes	e5fbc9545d	add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer	2023-09-27 21:15:54 -06:00
Cyberes	43299b32ad	clean up background threads	2023-09-27 19:39:04 -06:00
Cyberes	105b66d5e2	unify error message handling	2023-09-27 14:48:47 -06:00
Cyberes	957a6cd092	fix error handling	2023-09-27 14:36:49 -06:00
Cyberes	aba2e5b9c0	don't use db pooling, add LLM-ST-Errors header to disable formatted errors	2023-09-26 23:59:22 -06:00
Cyberes	048e5a8060	fix API key handling	2023-09-26 22:49:53 -06:00
Cyberes	d9bbcc42e6	more work on openai endpoint	2023-09-26 22:09:11 -06:00
Cyberes	52e6965b5e	don't count SYSTEM tokens for recent prompters, fix sql exclude for SYSTEM tokens	2023-09-25 13:00:39 -06:00
Cyberes	bbe5d5a8fe	improve openai endpoint, exclude system tokens more places	2023-09-25 09:32:23 -06:00
Cyberes	6459a1c91b	allow setting simultaneous IP limit per-token, fix token use tracker, fix tokens on streaming	2023-09-25 00:55:20 -06:00
Cyberes	cb99c3490e	rewrite tokenizer, restructure validation	2023-09-24 13:02:30 -06:00
Cyberes	62412f4873	add config setting for hostname	2023-09-23 23:24:08 -06:00
Cyberes	84a1fcfdd8	don't store host if it's an IP	2023-09-23 23:14:22 -06:00
Cyberes	76a1428ba0	implement streaming for vllm	2023-09-23 17:57:23 -06:00
Cyberes	8593198216	close mysql cursor	2023-09-20 21:19:26 -06:00
Cyberes	03e3ec5490	port to mysql, use vllm tokenizer endpoint	2023-09-20 20:30:31 -06:00
Cyberes	eb3179cfff	fix recent proompters to work with gunicorn	2023-09-17 19:06:53 -06:00
Cyberes	3c1254d3bf	cache stats in background	2023-09-17 18:55:36 -06:00
Cyberes	77edbe779c	actually validate prompt length lol	2023-09-14 18:31:13 -06:00
Cyberes	3100b0a924	set up queue to work with gunicorn processes, other improvements	2023-09-14 17:38:20 -06:00
Cyberes	93a344f4c5	check if the backend crapped out, print some more stuff	2023-09-14 14:26:25 -06:00
Cyberes	79b1e01b61	option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup	2023-09-14 14:05:50 -06:00
Cyberes	bcedd2ab3d	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
Cyberes	9740df07c7	add openai-compatible backend	2023-09-12 16:40:09 -06:00
Cyberes	1d9f40765e	remove text-generation-inference backend	2023-09-12 13:09:47 -06:00
Cyberes	6152b1bb66	fix invalid param error, add manual model name	2023-09-12 10:30:45 -06:00
Cyberes	40ac84aa9a	actually we don't want to emulate openai	2023-09-12 01:04:11 -06:00
Cyberes	747d838138	move where the vllm model is set	2023-09-11 21:05:22 -06:00
Cyberes	4c9d543eab	implement vllm backend	2023-09-11 20:47:19 -06:00
Cyberes	2d8812a6cd	fix crash again	2023-08-31 09:31:16 -06:00
Cyberes	47887c3925	missed a spot, clean up json error handling	2023-08-30 20:19:23 -06:00
Cyberes	2816c01902	refactor generation route	2023-08-30 18:53:26 -06:00

38 Commits