Merge cluster to master #3

cyberes · 2023-10-27T19:19:13-06:00

cyberes commented

2023-10-27 19:19:13 -06:00

No description provided.

cyberes added 163 commits 2023-10-27 19:19:14 -06:00

e7b57cad7b set up cluster config and basic background workers

624ca74ce5 mvp

114f36e709 functional

e0f86d053a reorganize to api v2

e6267a7d46 remove vllm from requirements.txt

11a10f85c1 adjust home page

e553fa6e9f adjust home page fontsize

91ba2fad1b add proompter stats back in

1151bb5475 adjust stats

166b2316e8 depricate v1

592eb08cb1 add message for /v1/

7af3dbd76b add message about settings

61856b4383 adjust message

9235725bdd adjust message

bc25d92c95 reduce tokens for backend tester

c5b30d985c adjust jinja template

3ecb7bcf88 adjust jinja template

b10d22ca0d cache the home page in the background

25ec56a5ef get streaming working, remove /v2/

d203973e80 fix routes

93d19fb95b fix exception

2a3ff7e21e update openai endpoints

f7e9687527 finish openai endpoints

51881ae39d fix tokenizer

a594729d00 fix keyerror

21da2f6373 fix openai error message

d1c4e68f8b fix openai models response

b0089859d7 fix ratelimiting

4f226ae38e handle requests to offline backends

94141b8ecf fix processing not being decremented on streaming, fix confusion over queue, adjust stop sequences

aed5db4968 trying to narrow down error

cd325216e2 test

07d6f6d8e9 test

f6acd67738 t

70126acdf2 test

0f5e22191c test

62eb0196cc t

ca1baa4870 test

63c12ea830 fix

32ad97e57c do default model rather than default backend, adjust moderation endpoint logic and add timeout, exclude system tokens from recent proompters, calculate number of moderators from endpoint concurrent gens, adjust homepage

581a0fec99 fix exception

e16f415749 fix

33b4b8404b clean up streaming

f88e2362c5 remove some debug prints

67f5df9bb9 fix stats page

1a7f22ec55 adjust again

6dc3529190 show online status on stats page

5f4e4710c1 option to prioritize by parameter count

b76e77a66a fix exception

4634e36eeb text

7e3af3599d test

4deb32bf1c test

1b21cb69c1 test

95d781725e t

a15b5465df c

f3a13fcda8 c

6af5365015 c

7cb624c5f5 f

364b795268 fix

77db34a6a7 g

6bad5b3fa0 t

d0eec88dbd f

754a4cbdf3 r

5e90fa54d4 handle model offline

d78ef652fc c

7acaa3c885 g

62d5d43da4 handle backend offline in tokenizer

09fa69e031 fix

6723dd79dc fix exceptoin

1670594908 fix import error

acf409abfc fix background logger, add gradio chat example

08df52a4fd fix exception when not valid model

27e461c76b test

19e62be3e8 t

979a945466 t

84c1ed8737 t

a53790ee37 fix???

a229b4d6c5 c

01fb619b9b f

3d0a5cf0a2 t

5a61bdccd4 f

64d7a9edbb fix

10eb6269b7 t

6be1e9acd3 t

fb8bc05b4c t

0718f10eb9 t

e07e31df0a fix

9b819573e8 fix import error

817c454c89 t

46d44f95ac t

a37b12a221 t

96dd62478f fix

50992116f5 fix

9befda5acb c

5540112607 t

0bef14ea55 t

c4cc7bbaa0 f

8df667bc0a t

67173f30dd t

e9f6fdf65e fix streaming?

da20d1807b actually wait again

ea61766838 fix

e8964fcfd2 fix the queue??

3e5feb9c97 fix stat

467e1893ea fix issue with null data on openai

ae4d4e5ca9 fix exception

5f7bf4faca misc changes

18e37a72ae add model selection to openai endpoint

f4e5b5275d test

7286e38cb0 t

78114771b0 fix oai exception

1d1c45dc1a add length penalty param to vllm

69b8c1e35c fix openai confusion

169e216a38 add background thread to gradio

74cf8f309b clean up

4e3985e156 fix wrong status code on openai streaming

ca7044bc90 update gradio chat

83f3ba8919 trying to fix workers still processing after backend goes offline

b3f0c4b28f remove debug print

3ec9b2347f fix wrong datatype

31ab4188f1 fix issues with queue and streaming

381bdb950f remove debug print

24aab3cd93 fix streaming disabled

151b3e4769 begin streaming rewrite

2c7773cc4f get streaming working again

f421436048 add nginx config

19a193b792 increase tokenization chunk size

20047fa0e4 2000 chunk size

1e68e10b62 fix GeneratorExit

21755450a3 test

81baf9616f revert

806e522d16 don't pickle streaming

70cf6843e5 update requiorements

c3c053e071 test

9e3cbc9d2e fix streaming slowdown?

6f65791795 adjust

2ed0e01db6 background thread

7998cfca87 cleanup

2fed87d340 remove timed-out items from queue

4c2c164ce1 test

90adffaec8 test

be03569165 use backend handler to build parameters when sending test prompt

92e4ecd8a1 refer to queue for tracking IP count rather than seperate value

50377eca22 track lag on get_ip_request_count()

56a2ca464b change print

b9566e9db7 docs and stuff

6e74ce7c28 fix old code in completions

4f5b2dbecb add tests

0abd4b94fb track down keyerror

e838f591aa fix keyerror?

763139c949 fix keyerror

1a15232400 tests: make sure all prompts are the same

f39e976b34 dameon printer: Calculate the queue size the same way it's done on the stats

e236e93a79 clean up a bit

d43f110a14 fix redis cycle and add no reset to daemon

3cf73fec9b fix a few exceptions when all backends go offline

0771c2325c fix inference workers quitting when a backend is offline, start adding logging, improve tokenizer error handling

177dabd209 Give some time for the background threads to get themselves ready to go

96ba48affc make sure to regen stats on startup

b4e01e129d fix when all offline

563630547a add robots.txt

28c250385d add todo

ee44371fdf Merge branch 'master' into cluster

cyberes merged commit 0059e7956c into master

2023-10-27 19:19:22 -06:00

cyberes referenced this issue from a commit

2023-10-27 19:19:23 -06:00

Merge cluster to master (#3)

This repo is archived. You cannot comment on pull requests.

No reviewers

No Label

No Milestone

No project

No Assignees

1 Participants

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cyberes/local-llm-server#3