local-llm-server

Commit Graph

Author	SHA1	Message	Date
Cyberes	62eb0196cc	t	2023-10-03 00:13:55 -06:00
Cyberes	0f5e22191c	test	2023-10-03 00:12:37 -06:00
Cyberes	cd325216e2	test	2023-10-02 22:45:07 -06:00
Cyberes	94141b8ecf	fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences	2023-10-02 20:53:08 -06:00
Cyberes	b0089859d7	fix ratelimiting	2023-10-02 02:05:15 -06:00
Cyberes	21da2f6373	fix openai error message	2023-10-01 22:58:08 -06:00
Cyberes	a594729d00	fix keyerror	2023-10-01 22:37:13 -06:00
Cyberes	51881ae39d	fix tokenizer	2023-10-01 17:19:34 -06:00
Cyberes	f7e9687527	finish openai endpoints	2023-10-01 16:04:53 -06:00
Cyberes	2a3ff7e21e	update openai endpoints	2023-10-01 14:15:01 -06:00
Cyberes	25ec56a5ef	get streaming working, remove /v2/	2023-10-01 00:20:00 -06:00
Cyberes	114f36e709	functional	2023-09-30 19:41:50 -06:00
Cyberes	624ca74ce5	mvp	2023-09-29 00:09:44 -06:00
Cyberes	e7b57cad7b	set up cluster config and basic background workers	2023-09-28 18:40:24 -06:00
Cyberes	347a82b7e1	avoid sending to backend to tokenize if it's greater than our specified context size	2023-09-28 03:54:20 -06:00
Cyberes	e5fbc9545d	add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer	2023-09-27 21:15:54 -06:00
Cyberes	957a6cd092	fix error handling	2023-09-27 14:36:49 -06:00
Cyberes	aba2e5b9c0	don't use db pooling, add LLM-ST-Errors header to disable formatted errors	2023-09-26 23:59:22 -06:00
Cyberes	d9bbcc42e6	more work on openai endpoint	2023-09-26 22:09:11 -06:00
Cyberes	e0af2ea9c5	convert to gunicorn	2023-09-26 13:32:33 -06:00
Cyberes	11e84db59c	update database, tokenizer handle null prompt, convert top_p to vllm on openai, actually validate prompt on streaming,	2023-09-25 22:32:48 -06:00
Cyberes	8240a1ebbb	fix background log not doing anything	2023-09-25 18:18:29 -06:00
Cyberes	1646a00987	implement streaming on openai, improve streaming, run DB logging in background thread	2023-09-25 12:30:40 -06:00
Cyberes	320f51e01c	further align openai endpoint with expected responses	2023-09-24 21:45:30 -06:00
Cyberes	cb99c3490e	rewrite tokenizer, restructure validation	2023-09-24 13:02:30 -06:00
Cyberes	76a1428ba0	implement streaming for vllm	2023-09-23 17:57:23 -06:00
Cyberes	81452ec643	adjust vllm info	2023-09-21 20:13:29 -06:00
Cyberes	03e3ec5490	port to mysql, use vllm tokenizer endpoint	2023-09-20 20:30:31 -06:00
Cyberes	354ad8192d	fix division by 0, prettify /stats json, add js var to home	2023-09-16 17:37:43 -06:00
Cyberes	77edbe779c	actually validate prompt length lol	2023-09-14 18:31:13 -06:00
Cyberes	3100b0a924	set up queue to work with gunicorn processes, other improvements	2023-09-14 17:38:20 -06:00
Cyberes	79b1e01b61	option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup	2023-09-14 14:05:50 -06:00
Cyberes	c45e68a8c8	adjust requests timeout, add service file	2023-09-14 01:32:49 -06:00
Cyberes	05a45e6ac6	didnt test anything	2023-09-13 11:51:46 -06:00
Cyberes	bcedd2ab3d	adjust logging, add more vllm stuff	2023-09-13 11:22:33 -06:00
Cyberes	9740df07c7	add openai-compatible backend	2023-09-12 16:40:09 -06:00
Cyberes	1d9f40765e	remove text-generation-inference backend	2023-09-12 13:09:47 -06:00
Cyberes	6152b1bb66	fix invalid param error, add manual model name	2023-09-12 10:30:45 -06:00
Cyberes	40ac84aa9a	actually we don't want to emulate openai	2023-09-12 01:04:11 -06:00
Cyberes	747d838138	move where the vllm model is set	2023-09-11 21:05:22 -06:00
Cyberes	4c9d543eab	implement vllm backend	2023-09-11 20:47:19 -06:00
Cyberes	c14cc51f09	get working with ooba again, give up on dockerfile	2023-09-11 09:51:01 -06:00
Cyberes	bf39b8da63	still having issues	2023-08-31 09:24:37 -06:00
Cyberes	47887c3925	missed a spot, clean up json error handling	2023-08-30 20:19:23 -06:00
Cyberes	2816c01902	refactor generation route	2023-08-30 18:53:26 -06:00
Cyberes	f9b9051bad	update weighted_average_column_for_model to account for when there was an error reported, insert null for response tokens when error, correctly parse x-forwarded-for, correctly convert model reported by hf-textgen	2023-08-29 15:46:56 -06:00
Cyberes	23f3fcf579	log errors to database	2023-08-29 14:48:33 -06:00
Cyberes	b44dfa2471	update info page	2023-08-29 14:00:35 -06:00
Cyberes	ba0bc87434	add HF text-generation-inference backend	2023-08-29 13:46:41 -06:00
Cyberes	0aa52863bc	forgot to start workers	2023-08-23 20:33:49 -06:00

1 2

59 Commits