local-llm-server/llm_server/routes/v1/info.py

import time

from flask import jsonify, request

from . import bp
from ..cache import cache
from ... import opts
from ...llm.info import get_running_model


# @bp.route('/info', methods=['GET'])
# # @cache.cached(timeout=3600, query_string=True)
# def get_info():
#     # requests.get()
#     return 'yes'


@bp.route('/model', methods=['GET'])
def get_model():
    # We will manage caching ourself since we don't want to cache
    # when the backend is down. Also, Cloudflare won't cache 500 errors.
    cache_key = 'model_cache::' + request.url
    cached_response = cache.get(cache_key)

    if cached_response:
        return cached_response

    model_name, error = get_running_model()
    if not model_name:
        response = jsonify({
            'code': 502,
            'msg': 'failed to reach backend',
            'type': error.__class__.__name__
        }), 500  # return 500 so Cloudflare doesn't intercept us
    else:
        response = jsonify({
            'result': opts.manual_model_name if opts.manual_model_name else model_name,
            'timestamp': int(time.time())
        }), 200
        cache.set(cache_key, response, timeout=60)

    return response
MVP 2023-08-21 21:28:52 -06:00			`import time`

do caching ourself on /model 2023-08-23 16:40:20 -06:00			`from flask import jsonify, request`
MVP 2023-08-21 21:28:52 -06:00
			`from . import bp`
implement vllm backend 2023-09-11 20:47:19 -06:00			`from ..cache import cache`
fix invalid param error, add manual model name 2023-09-12 10:30:45 -06:00			`from ... import opts`
prototype hf-textgen and adjust logging 2023-08-22 19:58:31 -06:00			`from ...llm.info import get_running_model`
MVP 2023-08-21 21:28:52 -06:00

			`# @bp.route('/info', methods=['GET'])`
			`# # @cache.cached(timeout=3600, query_string=True)`
			`# def get_info():`
			`# # requests.get()`
			`# return 'yes'`


			`@bp.route('/model', methods=['GET'])`
			`def get_model():`
do caching ourself on /model 2023-08-23 16:40:20 -06:00			`# We will manage caching ourself since we don't want to cache`
			`# when the backend is down. Also, Cloudflare won't cache 500 errors.`
			`cache_key = 'model_cache::' + request.url`
			`cached_response = cache.get(cache_key)`

			`if cached_response:`
			`return cached_response`

fix invalid param error, add manual model name 2023-09-12 10:30:45 -06:00			`model_name, error = get_running_model()`
			`if not model_name:`
do caching ourself on /model 2023-08-23 16:40:20 -06:00			`response = jsonify({`
model info timeout and additional info 2023-08-23 16:07:43 -06:00			`'code': 502,`
refactor generation route 2023-08-30 18:53:26 -06:00			`'msg': 'failed to reach backend',`
model info timeout and additional info 2023-08-23 16:07:43 -06:00			`'type': error.__class__.__name__`
do caching ourself on /model 2023-08-23 16:40:20 -06:00			`}), 500 # return 500 so Cloudflare doesn't intercept us`
minor adjustments 2023-08-21 22:49:44 -06:00			`else:`
actually we don't want to emulate openai 2023-09-12 01:04:11 -06:00			`response = jsonify({`
fix invalid param error, add manual model name 2023-09-12 10:30:45 -06:00			`'result': opts.manual_model_name if opts.manual_model_name else model_name,`
actually we don't want to emulate openai 2023-09-12 01:04:11 -06:00			`'timestamp': int(time.time())`
			`}), 200`
do caching ourself on /model 2023-08-23 16:40:20 -06:00			`cache.set(cache_key, response, timeout=60)`

			`return response`