Default Branch

88702d8763 · Fixing CI. (#1748) · Updated 2024-04-15 10:47:36 -06:00

Branches

7c8b473d58 · fix: revise temp scaling logic · Updated 2024-04-15 14:13:40 -06:00

0
4

7d6216d63b · Update response type for `/v1/chat/completions` and `/v1/completions` · Updated 2024-04-15 08:57:16 -06:00

1
1

238d2fefab · fix: skip grammar tests since they timeout · Updated 2024-04-12 15:50:38 -06:00

1
9

8ebb560f2f · feat: integrate triton compilations demo · Updated 2024-04-12 15:47:15 -06:00

1
1

0c4a634640 · fix: use get_speculate to the number of layers · Updated 2024-04-12 12:21:52 -06:00

1
1

fa3ec86f86 · remvoe unused kernels · Updated 2024-04-12 06:18:58 -06:00

7
3

10dd0150c0 · Dummy fix for medusa. · Updated 2024-04-12 04:12:09 -06:00

13
9

aa4f6a42b0 · fix: update tests for new behavior · Updated 2024-04-11 16:46:39 -06:00

17
9

b83aab9bb3 · Easier defaults for models stemmed from configs. · Updated 2024-04-11 06:48:39 -06:00

11
0
Included

d0bc603fe6 · feat: explore compiled MLP bench · Updated 2024-04-08 20:36:09 -06:00

17
1

2762e6883e · fix: include fsm_grammar_states in FlashMistralBatch from_pb · Updated 2024-04-08 11:23:46 -06:00

17
1

78f87d5a0c · Temporary implem of torch.compile on our stuff. · Updated 2024-03-21 12:56:40 -06:00

37
1

c1095bb61a · add debug · Updated 2024-03-18 04:54:31 -06:00

75
26

b5dcc87459 · fix: include shared python library during rust build step · Updated 2024-03-08 16:13:07 -07:00

44
8

a7cc4dc9da · fix: bump client version · Updated 2024-03-04 07:29:27 -07:00

44
1

b47b161cab · feat: update more snapshots · Updated 2024-02-29 15:06:13 -07:00

57
4

960cc95a0e · Update speculation.md · Updated 2024-02-27 07:55:37 -07:00

56
3

a42dc2027b · update commit · Updated 2024-02-27 03:24:07 -07:00

56
2

cd57f9c632 · fix: avoid duplicate bos token · Updated 2024-02-23 07:53:18 -07:00

57
1

5cdee2a591 · Merge branch 'amihalik-update-chat-completion-messages' into ci-amihalik-update-chat-completion-messages · Updated 2024-02-15 10:50:14 -07:00

70
3