Default Branch

b57f370386 · Saving some VRAM. (#2790) · Updated 2024-12-02 20:04:21 -07:00

Branches

12381b0b0e · delete the last no repeat processor from warpers · Updated 2024-07-26 03:22:46 -06:00

309
5

169c8c2cf5 · token.to_str() returns result · Updated 2024-07-26 02:52:55 -06:00

357
6

5afc98a7d7 · Snapshot update with vllm paged. · Updated 2024-07-25 04:17:40 -06:00

283
3

344427b6ab · feat(router): drop permit after batching · Updated 2024-07-23 14:40:14 -06:00

275
1

db7e043ded · New version. · Updated 2024-07-23 10:29:13 -06:00

276
1

0c95f7a942 · Debug softcap flash decoding activation · Updated 2024-07-23 07:12:19 -06:00

280
1

dee649c60c · Chore: Fix naming issues regarding head_size, there can only be one. · Updated 2024-07-23 03:26:53 -06:00

281
1

82fc879e17 · feat: refactor lora linear and remove adapter layers · Updated 2024-07-18 13:58:55 -06:00

303
1

a1b69a8cc5 · Completing development guide · Updated 2024-07-18 09:38:18 -06:00

386
2

959b9dc25f · Fixup constructor arguments · Updated 2024-07-17 01:42:24 -06:00

309
16

2967b8168c · fix post refactor · Updated 2024-07-16 07:16:27 -06:00

303
51

f6ad3b3585 · Some MoE exploration · Updated 2024-07-15 05:47:52 -06:00

309
1

5b27307438 · Don't error on OpenAI valid `top_p` values. · Updated 2024-07-12 14:22:23 -06:00

309
1

5c69639f74 · add condition different than PR · Updated 2024-07-12 05:19:52 -06:00

317
8

4dfdb481fb · Version 2.1.1 · Updated 2024-07-04 04:39:07 -06:00

334
1

fe3991e857 · feat: add simple ttft load_test · Updated 2024-07-02 09:57:01 -06:00

339
1

cb232a35a9 · feat: add test to view batch speedup amount · Updated 2024-07-02 07:33:26 -06:00

339
1

0a5b19a3ed · updated doc · Updated 2024-07-02 07:10:26 -06:00

383
22

dea9c0dc74 · Fixing rocm. (#2164) · Updated 2024-07-02 04:01:08 -06:00

341
0
Included

88e2a6a23a · fix: avoid loading mistral adapters in mixtral · Updated 2024-07-01 13:49:05 -06:00

347
1