Cyberes 1cb6389a8a | ||
---|---|---|
config | ||
llm_server | ||
other | ||
templates | ||
.gitignore | ||
LICENSE | ||
README.md | ||
requirements.txt | ||
server.py |
README.md
local-llm-server
A HTTP API to serve local LLM Models.
The purpose of this server is to abstract your LLM backend from your frontend API. This enables you to make changes to (or even switch) your backend without affecting your clients.
Install
sudo apt install redis
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 server.py
An example systemctl service file is provided in other/local-llm.service
.
Configure
First, set up your LLM backend. Currently, only oobabooga/text-generation-webui is supported, but eventually huggingface/text-generation-inference will be the default.
Then, configure this server. The config file is located at config/config.yml.sample
so copy it to config/config.yml
.
- Set
backend_url
to the base API URL of your backend. - Set
token_limit
to the configured token limit of the backend. This number is shown to clients and on the home page.
To set up token auth, add rows to the token_auth
table in the SQLite database.
token
: the token/password.
type
: the type of token. Currently unused (maybe for a future web interface?) but required.
priority
: the lower this value, the higher the priority. Higher priority tokens are bumped up in the queue line.
uses
: how many responses this token has generated. Leave empty.
max_uses
: how many responses this token is allowed to generate. Leave empty to leave unrestricted.
expire
: UNIX timestamp of when this token expires and is not longer valid.
disabled
: mark the token as disabled.