The purpose of this server is to abstract your LLM backend from your frontend API. This enables you to make changes to (or even switch) your backend without affecting your clients.
### Install
1.`sudo apt install redis`
2.`python3 -m venv venv`
3.`source venv/bin/activate`
4.`pip install -r requirements.txt`
5.`python3 server.py`
An example systemctl service file is provided in `other/local-llm.service`.
First, set up your LLM backend. Currently, only [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) is supported, but
eventually [huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference) will be the default.
**DO NOT** lose your database. It's used for calculating the estimated wait time based on average TPS and response tokens and if you lose those stats your numbers will be inaccurate until the database fills back up again. If you change GPUs, you should probably clear the `generation_time` time column in the `prompts` table.