This repository has been archived on 2024-10-27. You can view files and clone it, but cannot push or open issues or pull requests.
local-llm-server/llm_server/llm
Cyberes 347a82b7e1 avoid sending to backend to tokenize if it's greater than our specified context size 2023-09-28 03:54:20 -06:00
..
oobabooga further align openai endpoint with expected responses 2023-09-24 21:45:30 -06:00
openai don't use db pooling, add LLM-ST-Errors header to disable formatted errors 2023-09-26 23:59:22 -06:00
vllm avoid sending to backend to tokenize if it's greater than our specified context size 2023-09-28 03:54:20 -06:00
__init__.py more work on openai endpoint 2023-09-26 22:09:11 -06:00
generator.py add ratelimiting to websocket streaming endpoint, fix queue not decrementing IP requests, add console printer 2023-09-27 21:15:54 -06:00
info.py option to disable streaming, improve timeout on requests to backend, fix error handling. reduce duplicate code, misc other cleanup 2023-09-14 14:05:50 -06:00
llm_backend.py fix error handling 2023-09-27 14:36:49 -06:00