OlivierDehaene
|
6f0f1d70f6
|
v0.5.0 (#168)
|
2023-04-11 20:32:18 +02:00 |
OlivierDehaene
|
f26dfd0dc1
|
feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
|
2023-04-11 19:16:41 +02:00 |
OlivierDehaene
|
299217c95c
|
feat(server): add flash attention llama (#144)
|
2023-04-11 16:38:22 +02:00 |
OlivierDehaene
|
9987960062
|
feat(router): make router input validation optional (#164)
|
2023-04-09 20:22:27 +02:00 |
OlivierDehaene
|
1883d8ecde
|
feat(docker): improve flash_attention caching (#160)
|
2023-04-09 19:59:16 +02:00 |
OlivierDehaene
|
3f2542bb6a
|
fix(server): fix escape characters in stop sequence (#155)
|
2023-04-05 19:37:41 +02:00 |
OlivierDehaene
|
c0aeb32583
|
feat(server): flash santacoder (#153)
|
2023-04-03 19:06:42 +02:00 |
OlivierDehaene
|
fef1a1c381
|
v0.4.3 (#152)
|
2023-03-30 17:28:14 +02:00 |
OlivierDehaene
|
84722f3e33
|
v0.4.2 (#151)
|
2023-03-30 17:10:01 +02:00 |
OlivierDehaene
|
08b7e4a282
|
fix(server): fix flash neox rotary embeddings (#150)
|
2023-03-30 16:12:23 +02:00 |
OlivierDehaene
|
610bb1f978
|
feat(benchmark): tui based benchmarking tool (#149)
|
2023-03-30 15:26:27 +02:00 |
OlivierDehaene
|
c9bdaa8b73
|
feat(server): reduce mlp and attn in one op for flash neox (#145)
|
2023-03-28 16:51:41 +02:00 |
OlivierDehaene
|
f000068944
|
feat(server): clear cache on error (#143)
|
2023-03-28 11:29:35 +02:00 |
Nick Hill
|
8e8dd984d8
|
feat(server): Add mypy-protobuf (#141)
Generates .pyi files for protobuf stubs which provide strong typing
information. Very helpful for IDE auto-completion, etc.
|
2023-03-27 09:25:15 +02:00 |
Nick Hill
|
462530c2b0
|
fix(server): Avoid using try/except to determine kind of AutoModel (#142)
|
2023-03-27 09:23:22 +02:00 |
OlivierDehaene
|
ab5fd8cf93
|
v0.4.1 (#140)
|
2023-03-26 16:37:51 +02:00 |
OlivierDehaene
|
678b2f3900
|
feat(server): cleanup flash neox loading (#139)
|
2023-03-26 16:37:21 +02:00 |
OlivierDehaene
|
d6a93fe992
|
fix(server): fix flash-neox scores warping (#137)
|
2023-03-24 18:21:41 +01:00 |
OlivierDehaene
|
05e9a796cc
|
feat(server): flash neoX (#133)
|
2023-03-24 14:02:14 +01:00 |
OlivierDehaene
|
b49dbf2d88
|
fix(server): use server tokenizer as gt (#128)
|
2023-03-16 12:12:26 +01:00 |
OlivierDehaene
|
8ad60b752f
|
fix(server): add position ids to neox (#126)
|
2023-03-15 13:12:49 +01:00 |
OlivierDehaene
|
cbd36aa4d1
|
fix(server): revert gpt-neox optims (#123)
|
2023-03-13 22:57:08 +01:00 |
OlivierDehaene
|
411d6247f4
|
v0.4.0 (#119)
|
2023-03-09 16:07:01 +01:00 |
OlivierDehaene
|
c0795de2f2
|
fix(server): do not warp prefill logits (#116)
|
2023-03-09 13:00:10 +01:00 |
OlivierDehaene
|
1a2d68250a
|
feat: support typical sampling (#114)
closes #112
|
2023-03-09 11:33:57 +01:00 |
OlivierDehaene
|
941cd42e0c
|
fix(server): fix index out of range for watermarking (#110)
|
2023-03-08 18:29:08 +01:00 |
OlivierDehaene
|
b1485e18c5
|
fix(server): fix galactica batch (#106)
closes #105
|
2023-03-07 20:05:21 +01:00 |
OlivierDehaene
|
3fef90d50f
|
feat(clients): Python client (#103)
|
2023-03-07 18:52:22 +01:00 |
OlivierDehaene
|
cd5961b5da
|
feat: allow local models (#101)
closes #99
|
2023-03-06 14:39:36 +01:00 |
OlivierDehaene
|
9b205d33cc
|
fix(server): fix generate_stream by forcing tokens to be decoded correctly (#100)
|
2023-03-06 13:22:58 +01:00 |
OlivierDehaene
|
1c19b0934e
|
v0.3.2 (#97)
|
2023-03-03 18:42:20 +01:00 |
OlivierDehaene
|
0b6807caa4
|
feat(server): fix transformers commit (#96)
|
2023-03-03 17:56:27 +01:00 |
OlivierDehaene
|
2d39f199ae
|
feat(server): update to hf_transfer==0.1.2 (#93)
|
2023-03-03 11:26:27 +01:00 |
OlivierDehaene
|
9b8ea6a6c7
|
feat(server): add logits watermark (#90)
|
2023-03-02 12:30:41 +01:00 |
OlivierDehaene
|
65e2f1624e
|
fix(server): fix token_is_special (#87)
|
2023-02-24 17:20:00 +01:00 |
OlivierDehaene
|
0ac184ce77
|
feat(server): add special token bool (#85)
|
2023-02-24 15:55:57 +01:00 |
OlivierDehaene
|
4b1c9720c0
|
v0.3.1 (#84)
|
2023-02-24 13:27:41 +01:00 |
OlivierDehaene
|
44ce098c10
|
feat(server): pre-allocate max attention mask (#75)
|
2023-02-24 12:49:21 +01:00 |
OlivierDehaene
|
78063c0569
|
fix(server): remove position_ids from galactica forward (#82)
closes #80
|
2023-02-20 19:28:57 +01:00 |
OlivierDehaene
|
17bc841b1b
|
feat(server): enable hf-transfer (#76)
|
2023-02-18 14:04:11 +01:00 |
OlivierDehaene
|
c720555adc
|
v0.3.0 (#72)
|
2023-02-16 17:28:29 +01:00 |
OlivierDehaene
|
439fcaf810
|
feat(router): add prometheus metrics scrape endpoint (#71)
|
2023-02-16 17:18:53 +01:00 |
OlivierDehaene
|
c5a4a1faf3
|
feat(server): improve download logging (#66)
|
2023-02-15 16:11:32 +01:00 |
OlivierDehaene
|
0fbc691946
|
feat: add safetensors conversion (#63)
|
2023-02-14 13:02:16 +01:00 |
OlivierDehaene
|
9af454142a
|
feat: add distributed tracing (#62)
|
2023-02-13 13:02:45 +01:00 |
OlivierDehaene
|
1ad3250b89
|
fix(docker): increase shm size (#60)
|
2023-02-08 17:53:33 +01:00 |
OlivierDehaene
|
c503a639b1
|
feat(server): support t5 (#59)
|
2023-02-07 18:25:17 +01:00 |
OlivierDehaene
|
2fe5e1b30e
|
V0.2.1 (#58)
|
2023-02-07 15:40:25 +01:00 |
OlivierDehaene
|
4acc42a605
|
fix(server): better handling of inference mode (#57)
|
2023-02-07 15:38:22 +01:00 |
OlivierDehaene
|
20c3c5940c
|
feat(router): refactor API and add openAPI schemas (#53)
|
2023-02-03 12:43:37 +01:00 |