OlivierDehaene
610bb1f978
feat(benchmark): tui based benchmarking tool ( #149 )
2023-03-30 15:26:27 +02:00
OlivierDehaene
c9bdaa8b73
feat(server): reduce mlp and attn in one op for flash neox ( #145 )
2023-03-28 16:51:41 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error ( #143 )
2023-03-28 11:29:35 +02:00
Nick Hill
8e8dd984d8
feat(server): Add mypy-protobuf ( #141 )
...
Generates .pyi files for protobuf stubs which provide strong typing
information. Very helpful for IDE auto-completion, etc.
2023-03-27 09:25:15 +02:00
Nick Hill
462530c2b0
fix(server): Avoid using try/except to determine kind of AutoModel ( #142 )
2023-03-27 09:23:22 +02:00
OlivierDehaene
ab5fd8cf93
v0.4.1 ( #140 )
2023-03-26 16:37:51 +02:00
OlivierDehaene
678b2f3900
feat(server): cleanup flash neox loading ( #139 )
2023-03-26 16:37:21 +02:00
OlivierDehaene
d6a93fe992
fix(server): fix flash-neox scores warping ( #137 )
2023-03-24 18:21:41 +01:00
OlivierDehaene
05e9a796cc
feat(server): flash neoX ( #133 )
2023-03-24 14:02:14 +01:00
OlivierDehaene
b49dbf2d88
fix(server): use server tokenizer as gt ( #128 )
2023-03-16 12:12:26 +01:00
OlivierDehaene
8ad60b752f
fix(server): add position ids to neox ( #126 )
2023-03-15 13:12:49 +01:00
OlivierDehaene
cbd36aa4d1
fix(server): revert gpt-neox optims ( #123 )
2023-03-13 22:57:08 +01:00
OlivierDehaene
411d6247f4
v0.4.0 ( #119 )
2023-03-09 16:07:01 +01:00
OlivierDehaene
c0795de2f2
fix(server): do not warp prefill logits ( #116 )
2023-03-09 13:00:10 +01:00
OlivierDehaene
1a2d68250a
feat: support typical sampling ( #114 )
...
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene
941cd42e0c
fix(server): fix index out of range for watermarking ( #110 )
2023-03-08 18:29:08 +01:00
OlivierDehaene
b1485e18c5
fix(server): fix galactica batch ( #106 )
...
closes #105
2023-03-07 20:05:21 +01:00
OlivierDehaene
3fef90d50f
feat(clients): Python client ( #103 )
2023-03-07 18:52:22 +01:00
OlivierDehaene
cd5961b5da
feat: allow local models ( #101 )
...
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene
9b205d33cc
fix(server): fix generate_stream by forcing tokens to be decoded correctly ( #100 )
2023-03-06 13:22:58 +01:00
OlivierDehaene
1c19b0934e
v0.3.2 ( #97 )
2023-03-03 18:42:20 +01:00
OlivierDehaene
0b6807caa4
feat(server): fix transformers commit ( #96 )
2023-03-03 17:56:27 +01:00
OlivierDehaene
2d39f199ae
feat(server): update to hf_transfer==0.1.2 ( #93 )
2023-03-03 11:26:27 +01:00
OlivierDehaene
9b8ea6a6c7
feat(server): add logits watermark ( #90 )
2023-03-02 12:30:41 +01:00
OlivierDehaene
65e2f1624e
fix(server): fix token_is_special ( #87 )
2023-02-24 17:20:00 +01:00
OlivierDehaene
0ac184ce77
feat(server): add special token bool ( #85 )
2023-02-24 15:55:57 +01:00
OlivierDehaene
4b1c9720c0
v0.3.1 ( #84 )
2023-02-24 13:27:41 +01:00
OlivierDehaene
44ce098c10
feat(server): pre-allocate max attention mask ( #75 )
2023-02-24 12:49:21 +01:00
OlivierDehaene
78063c0569
fix(server): remove position_ids from galactica forward ( #82 )
...
closes #80
2023-02-20 19:28:57 +01:00
OlivierDehaene
17bc841b1b
feat(server): enable hf-transfer ( #76 )
2023-02-18 14:04:11 +01:00
OlivierDehaene
c720555adc
v0.3.0 ( #72 )
2023-02-16 17:28:29 +01:00
OlivierDehaene
439fcaf810
feat(router): add prometheus metrics scrape endpoint ( #71 )
2023-02-16 17:18:53 +01:00
OlivierDehaene
c5a4a1faf3
feat(server): improve download logging ( #66 )
2023-02-15 16:11:32 +01:00
OlivierDehaene
0fbc691946
feat: add safetensors conversion ( #63 )
2023-02-14 13:02:16 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
OlivierDehaene
1ad3250b89
fix(docker): increase shm size ( #60 )
2023-02-08 17:53:33 +01:00
OlivierDehaene
c503a639b1
feat(server): support t5 ( #59 )
2023-02-07 18:25:17 +01:00
OlivierDehaene
2fe5e1b30e
V0.2.1 ( #58 )
2023-02-07 15:40:25 +01:00
OlivierDehaene
4acc42a605
fix(server): better handling of inference mode ( #57 )
2023-02-07 15:38:22 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas ( #53 )
2023-02-03 12:43:37 +01:00
OlivierDehaene
b1482d9048
breaking(router): modify /generate API to only return generated text ( #50 )
...
@njhill, @yk FYI
generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.
We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene
df227ac20d
fix(server): allow greedy repetition penalty ( #51 )
2023-02-02 10:34:35 +01:00
OlivierDehaene
775115e3a5
feat(server): allow the server to use a local weight cache ( #49 )
2023-02-01 16:22:10 +01:00
OlivierDehaene
313194f6d7
feat(server): support repetition penalty ( #47 )
2023-02-01 15:58:42 +01:00
OlivierDehaene
2ad895a6cc
feat(server): allow gpt-neox models with odd vocab sizes to be sharded ( #48 )
2023-02-01 14:43:59 +01:00
OlivierDehaene
f830706b21
feat(server): Support GPT-Neox ( #39 )
2023-01-31 18:53:56 +01:00
OlivierDehaene
c6e8b9442b
fix(server): fix quantization for sharded models ( #45 )
2023-01-31 17:40:38 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support ( #41 )
2023-01-31 17:04:00 +01:00
OlivierDehaene
54fec93193
fix(server): fix seeding with multiple shards ( #44 )
2023-01-31 16:01:15 +01:00
OlivierDehaene
03bdf18290
fix(server): fix seeding on gpu ( #42 )
2023-01-31 14:30:33 +01:00