local-llm-server

Commit Graph

Author	SHA1	Message	Date
Cyberes	94141b8ecf	fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences	2023-10-02 20:53:08 -06:00
Cyberes	b0089859d7	fix ratelimiting	2023-10-02 02:05:15 -06:00
Cyberes	51881ae39d	fix tokenizer	2023-10-01 17:19:34 -06:00
Cyberes	f7e9687527	finish openai endpoints	2023-10-01 16:04:53 -06:00
Cyberes	2a3ff7e21e	update openai endpoints	2023-10-01 14:15:01 -06:00
Cyberes	25ec56a5ef	get streaming working, remove /v2/	2023-10-01 00:20:00 -06:00
Cyberes	114f36e709	functional	2023-09-30 19:41:50 -06:00
Cyberes	624ca74ce5	mvp	2023-09-29 00:09:44 -06:00
Cyberes	e7b57cad7b	set up cluster config and basic background workers	2023-09-28 18:40:24 -06:00
Cyberes	347a82b7e1	avoid sending to backend to tokenize if it's greater than our specified context size	2023-09-28 03:54:20 -06:00
Cyberes	957a6cd092	fix error handling	2023-09-27 14:36:49 -06:00
Cyberes	aba2e5b9c0	don't use db pooling, add LLM-ST-Errors header to disable formatted errors	2023-09-26 23:59:22 -06:00
Cyberes	d9bbcc42e6	more work on openai endpoint	2023-09-26 22:09:11 -06:00
Cyberes	e0af2ea9c5	convert to gunicorn	2023-09-26 13:32:33 -06:00
Cyberes	11e84db59c	update database, tokenizer handle null prompt, convert top_p to vllm on openai, actually validate prompt on streaming,	2023-09-25 22:32:48 -06:00
Cyberes	8240a1ebbb	fix background log not doing anything	2023-09-25 18:18:29 -06:00
Cyberes	1646a00987	implement streaming on openai, improve streaming, run DB logging in background thread	2023-09-25 12:30:40 -06:00
Cyberes	cb99c3490e	rewrite tokenizer, restructure validation	2023-09-24 13:02:30 -06:00
Cyberes	76a1428ba0	implement streaming for vllm	2023-09-23 17:57:23 -06:00
Cyberes	81452ec643	adjust vllm info	2023-09-21 20:13:29 -06:00
Cyberes	03e3ec5490	port to mysql, use vllm tokenizer endpoint	2023-09-20 20:30:31 -06:00
Cyberes	77edbe779c	actually validate prompt length lol	2023-09-14 18:31:13 -06:00
Cyberes	3100b0a924	set up queue to work with gunicorn processes, other improvements	2023-09-14 17:38:20 -06:00
Cyberes	79b1e01b61	option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup	2023-09-14 14:05:50 -06:00
Cyberes	c45e68a8c8	adjust requests timeout, add service file	2023-09-14 01:32:49 -06:00
Cyberes	05a45e6ac6	didnt test anything	2023-09-13 11:51:46 -06:00
Cyberes	bcedd2ab3d	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
Cyberes	9740df07c7	add openai-compatible backend	2023-09-12 16:40:09 -06:00
Cyberes	6152b1bb66	fix invalid param error, add manual model name	2023-09-12 10:30:45 -06:00
Cyberes	40ac84aa9a	actually we don't want to emulate openai	2023-09-12 01:04:11 -06:00
Cyberes	747d838138	move where the vllm model is set	2023-09-11 21:05:22 -06:00
Cyberes	4c9d543eab	implement vllm backend	2023-09-11 20:47:19 -06:00

32 Commits