**ARCHIVED PROJECT:** this project was created before any good solution existed for managing LLM endpoints and has now been superseded by many good options. [LiteLLM](https://github.com/BerriAI/litellm) is the best replacement. If a need for an un-authenticated public model arises, check out [cyberes/litellm-public](https://git.evulid.cc/cyberes/litellm-public).
The purpose of this server is to abstract your LLM backend from your frontend API. This enables you to switch your backend while providing a stable frontend clients.
7. Copy the systemd service file from `other/vllm/vllm.service` to `/etc/systemd/system/` and edit the paths to point to your install location. Then activate the server.
10. An example nginx site is provided at `other/nginx-site.conf`. Copy this to `/etc/nginx/default`.
11. Copy the example config from `config/config.yml.sample` to `config/config.yml`. Modify the config (it's well commented).
12. Set up your MySQL server with a database and user according to what you configured in `config.yml`.
13. Install the two systemd services in `other/` and activate them.
## Creating Tokens
You'll have to execute SQL queries to add tokens. phpMyAdmin makes this easy.
**Fields:**
-`token`: The authentication token. If it starts with `SYSTEM__`, it's reserved for internal usage.
-`type`: The token type. For your reference only, not used by the system (need to confirm this, though).
-`priority`: The priority of the token. Higher priority tokens are bumped up in the queue according to their priority.
-`simultaneous_ip`: How many requests from an IP are allowed to be in the queue.
-`openai_moderation_enabled`: enable moderation for this token. `1` means enabled, `0` is disabled.
-`uses`: How many times this token has been used. Set it to `0` and don't touch it.
-`max_uses`: How many times this token is allowed to be used. Set to `NULL` to disable restriction and allow infinite uses.
-`expire`: When the token expires and will no longer be allowed. A Unix timestamp.
-`disabled`: Set the token to be disabled.
## Updating VLLM
This project is linked to a specific VLLM version due to a dependency on the parameters. When updating, make sure the parameters in the `SamplingParams` object in [llm_server/llm/vllm/vllm_backend.py](https://git.evulid.cc/cyberes/local-llm-server/src/branch/master/llm_server/llm/vllm/vllm_backend.py) match up with those in VLLM's [vllm/sampling_params.py](https://github.com/vllm-project/vllm/blob/93348d9458af7517bb8c114611d438a1b4a2c3be/vllm/sampling_params.py).
Additionally, make sure our VLLM API server at [other/vllm/vllm_api_server.py](https://git.evulid.cc/cyberes/local-llm-server/src/branch/master/other/vllm/vllm_api_server.py) matches [vllm/entrypoints/api_server.py](https://github.com/vllm-project/vllm/blob/93348d9458af7517bb8c114611d438a1b4a2c3be/vllm/entrypoints/api_server.py).
Then, update the VLLM version in `requirements.txt`.
## To Do
- [ ] Support the Oobabooga Text Generation WebUI as a backend
- [ ] Make the moderation apply to the non-OpenAI endpoints as well