OlivierDehaene
afc5b999d0
fix(server): cleanup new flash past_key_values logic ( #217 )
2023-04-21 16:19:04 +02:00
OlivierDehaene
db4cb5e4ed
fix(server): fix past key values logic ( #216 )
...
@njhill fyi
2023-04-21 15:59:18 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info ( #215 )
2023-04-21 15:36:29 +02:00
Nick Hill
ac8c0f6fe4
feat(server): flash attention past key value optimizations ( #213 )
2023-04-21 14:57:18 +02:00
OlivierDehaene
274513e6a3
fix(ci): fix sha in docker image ( #212 )
2023-04-20 18:50:47 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel ( #202 )
2023-04-20 11:07:40 +02:00
OlivierDehaene
b6ee0ec7b0
feat(router): add git sha to info route ( #208 )
2023-04-19 21:36:59 +02:00
OlivierDehaene
252f42c1e6
fix(router): add auth token to get model info ( #207 )
2023-04-19 20:06:06 +02:00
OlivierDehaene
6837b2eb77
fix(docker): remove unused dependencies ( #205 )
2023-04-19 19:39:31 +02:00
OlivierDehaene
5d27f5259b
fix(server): fix hf_transfer issue with private repos ( #203 )
2023-04-19 17:36:16 +02:00
OlivierDehaene
a88c54bb4c
feat(server): check cuda capability when importing flash models ( #201 )
...
close #198
2023-04-19 12:52:37 +02:00
OlivierDehaene
e14ae3b5e9
feat(server): support quantization for flash models ( #200 )
...
closes #197
2023-04-19 12:51:11 +02:00
OlivierDehaene
2475aede61
feat(router): add info route ( #196 )
...
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
b927244eb5
feat(python-client): get list of currently deployed tgi models using the inference API ( #191 )
2023-04-17 18:43:24 +02:00
OlivierDehaene
c13b9d87c9
fix(router): fix truncation ( #190 )
...
closes #189
2023-04-17 16:51:53 +02:00
OlivierDehaene
7a1ba58557
fix(docker): fix docker image dependencies ( #187 )
2023-04-17 00:26:47 +02:00
OlivierDehaene
379c5c4da2
fix(docker): revert dockerfile changes ( #186 )
2023-04-14 19:30:30 +02:00
OlivierDehaene
f9047562d0
fix(docker): fix image ( #185 )
2023-04-14 18:58:38 +02:00
OlivierDehaene
1bb394631d
fix(docker): fix docker image ( #184 )
2023-04-14 17:31:13 +02:00
OlivierDehaene
01c0e368e5
fix(ci): fix cosign error ( #183 )
2023-04-14 12:35:26 +02:00
OlivierDehaene
53ee09c0b0
fea(dockerfile): better layer caching ( #159 )
2023-04-14 10:12:21 +02:00
OlivierDehaene
12e5633c4d
fix(ci): fix ci permissions ( #181 )
2023-04-13 16:32:37 +02:00
OlivierDehaene
c1e2ea3b78
feat(ci): faster scanning ( #180 )
2023-04-13 16:23:47 +02:00
OlivierDehaene
13f1cd024b
feat(ci): use large runners ( #179 )
2023-04-13 16:11:48 +02:00
OlivierDehaene
9683c37bd3
feat(ci): add Trivy and scan docker image ( #178 )
2023-04-13 15:43:17 +02:00
OlivierDehaene
643a39d556
feat(ci): add image signing with cosign ( #175 )
2023-04-13 15:26:34 +02:00
OlivierDehaene
64347b05ff
fix(ci): fix CVE in github-slug-action ( #174 )
2023-04-13 12:43:05 +02:00
OlivierDehaene
e3a63b6fbc
fix(launcher): revert change on shard errors ( #173 )
2023-04-13 11:07:11 +02:00
OlivierDehaene
880a76eed5
feat(server): support sharded santacoder ( #167 )
2023-04-12 17:18:08 +02:00
OlivierDehaene
5fa8ae041c
feat(server): optimize decode for sane tokenizers ( #170 )
2023-04-12 12:03:10 +02:00
OlivierDehaene
6f0f1d70f6
v0.5.0 ( #168 )
2023-04-11 20:32:18 +02:00
OlivierDehaene
f26dfd0dc1
feat(server): support OPT models ( #55 )
...
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama ( #144 )
2023-04-11 16:38:22 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional ( #164 )
2023-04-09 20:22:27 +02:00
OlivierDehaene
7dec65a244
fix(router): use buckets for metrics histograms ( #163 )
2023-04-09 20:13:28 +02:00
OlivierDehaene
5cddc055e6
fix(rust-client): use join_all instead of select_all to hopefully fix nccl issues ( #162 )
2023-04-09 20:07:02 +02:00
OlivierDehaene
e63a21eb4d
feat(launcher): allow disabling hf_transfer ( #161 )
2023-04-09 20:00:05 +02:00
OlivierDehaene
1883d8ecde
feat(docker): improve flash_attention caching ( #160 )
2023-04-09 19:59:16 +02:00
OlivierDehaene
3f2542bb6a
fix(server): fix escape characters in stop sequence ( #155 )
2023-04-05 19:37:41 +02:00
Guspan Tanadi
9122e7bd9c
docs(readme): provide link Logits Warper README ( #154 )
2023-04-04 13:27:46 +02:00
OlivierDehaene
c0aeb32583
feat(server): flash santacoder ( #153 )
2023-04-03 19:06:42 +02:00
OlivierDehaene
fef1a1c381
v0.4.3 ( #152 )
2023-03-30 17:28:14 +02:00
OlivierDehaene
84722f3e33
v0.4.2 ( #151 )
2023-03-30 17:10:01 +02:00
OlivierDehaene
08b7e4a282
fix(server): fix flash neox rotary embeddings ( #150 )
2023-03-30 16:12:23 +02:00
OlivierDehaene
610bb1f978
feat(benchmark): tui based benchmarking tool ( #149 )
2023-03-30 15:26:27 +02:00
OlivierDehaene
55106ec476
fix(ci): fix sagemaker action ( #148 )
2023-03-29 22:27:01 +02:00
OlivierDehaene
d503e8f09d
feat: aws sagemaker compatible image ( #147 )
...
The only difference is that now it pushes to
registry.internal.huggingface.tech/api-inference/community/text-generation-inference/sagemaker:...
instead of
registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sagemaker-...
---------
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2023-03-29 21:38:30 +02:00
OlivierDehaene
c9bdaa8b73
feat(server): reduce mlp and attn in one op for flash neox ( #145 )
2023-03-28 16:51:41 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error ( #143 )
2023-03-28 11:29:35 +02:00
Nick Hill
8e8dd984d8
feat(server): Add mypy-protobuf ( #141 )
...
Generates .pyi files for protobuf stubs which provide strong typing
information. Very helpful for IDE auto-completion, etc.
2023-03-27 09:25:15 +02:00