Nicolas Patry
b4fe248b17
fix(launcher): handle hub branches ( #278 )
2023-05-04 15:14:28 +02:00
OlivierDehaene
b67908e0cf
fix(launcher): pass weights cache override to the download process ( #274 )
...
closes #273
2023-05-03 23:39:35 +02:00
OlivierDehaene
85aa7e2e7b
feat(server): support hf endpoint weight layout ( #266 )
2023-05-03 11:36:24 +02:00
OlivierDehaene
4096000e34
fix(server): fix typo in tokenizers decode ( #269 )
...
closes #268
2023-05-03 10:10:34 +02:00
Nicolas Patry
411b0d4e1f
chore(github): add templates ( #264 )
2023-05-02 15:43:19 +02:00
Nicolas Patry
e86cca9723
Adding docs on how dynamic batching works. ( #258 )
...
This PR starts the minimal possible amount of explanation I could think
of. It tries to explain how dynamic batching occurs, the interactions
with past key values and ignores the padding problem.
Maybe some drawings could help too but I kept it to text for now.
2023-05-01 14:16:50 +02:00
OlivierDehaene
0e9d249b79
feat(benchmark): add support for private tokenizers ( #262 )
2023-04-29 12:17:30 +02:00
Nicolas Patry
b0b97fd9a7
doc(launcher): add more docs to the `launcher` itself and link in the README ( #257 )
2023-04-29 11:53:42 +02:00
OlivierDehaene
593a563414
feat(docker): add nvidia env vars ( #255 )
2023-04-27 19:18:33 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests ( #248 )
2023-04-27 19:16:35 +02:00
OlivierDehaene
b9ae7e5da1
chore(server): update transformers ( #250 )
2023-04-27 09:57:41 +02:00
Nick Hill
34bca0b8d3
fix(server): Small tidy of code from recent changes ( #251 )
...
remaining_decode_tokens was calculated twice in Seq2SeqLMBatch.filter()
2023-04-27 09:57:28 +02:00
Nick Hill
b4cf832c40
fix(server): fix reshaping of bloom past_key_values in concatenate() ( #252 )
...
Introduced in #214
Fixes #249
2023-04-27 09:51:27 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue ( #244 )
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry
c4fb09f2ae
feat(router): add tests to validation ( #237 )
2023-04-26 16:14:40 +02:00
Nicolas Patry
77758f603b
chore(launcher): refactor logic ( #242 )
...
Hopefully it's cleaner
2023-04-26 14:43:36 +02:00
OlivierDehaene
7de8a377b0
fix(benchmarking): fix benchmarking tool
2023-04-26 00:54:27 +02:00
Nicolas Patry
45344244cf
Starting some routing tests. ( #233 )
2023-04-25 14:13:14 +02:00
OlivierDehaene
323546df1d
fix(python-client): add auth headers to is supported requests ( #234 )
2023-04-25 13:55:26 +02:00
OlivierDehaene
37b64a5c10
chore(server): update safetensors version ( #235 )
2023-04-25 13:50:56 +02:00
OlivierDehaene
8b182eb986
feat(router): add endpoint info to /info route ( #228 )
2023-04-25 13:11:18 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ( #226 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
98a3e0d135
chore(server): update huggingface-hub ( #227 )
2023-04-24 15:57:13 +02:00
Nick Hill
4a7dd4085a
feat(server): reduce memory requirement ( #214 )
2023-04-24 14:15:42 +02:00
OlivierDehaene
6ded76a4ae
v0.6.0 ( #222 )
2023-04-21 21:00:57 +02:00
OlivierDehaene
97df0c7bc0
misc: update to rust 1.69 ( #221 )
2023-04-21 21:00:30 +02:00
OlivierDehaene
4b460e72fb
fix(server): fix flash batch filtering ( #220 )
2023-04-21 20:26:01 +02:00
OlivierDehaene
1ffea36ec2
fix(server): fix flash causal ( #219 )
2023-04-21 19:49:08 +02:00
OlivierDehaene
86bca365df
fix(server): fix flash causal ( #218 )
2023-04-21 19:42:16 +02:00
OlivierDehaene
afc5b999d0
fix(server): cleanup new flash past_key_values logic ( #217 )
2023-04-21 16:19:04 +02:00
OlivierDehaene
db4cb5e4ed
fix(server): fix past key values logic ( #216 )
...
@njhill fyi
2023-04-21 15:59:18 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info ( #215 )
2023-04-21 15:36:29 +02:00
Nick Hill
ac8c0f6fe4
feat(server): flash attention past key value optimizations ( #213 )
2023-04-21 14:57:18 +02:00
OlivierDehaene
274513e6a3
fix(ci): fix sha in docker image ( #212 )
2023-04-20 18:50:47 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel ( #202 )
2023-04-20 11:07:40 +02:00
OlivierDehaene
b6ee0ec7b0
feat(router): add git sha to info route ( #208 )
2023-04-19 21:36:59 +02:00
OlivierDehaene
252f42c1e6
fix(router): add auth token to get model info ( #207 )
2023-04-19 20:06:06 +02:00
OlivierDehaene
6837b2eb77
fix(docker): remove unused dependencies ( #205 )
2023-04-19 19:39:31 +02:00
OlivierDehaene
5d27f5259b
fix(server): fix hf_transfer issue with private repos ( #203 )
2023-04-19 17:36:16 +02:00
OlivierDehaene
a88c54bb4c
feat(server): check cuda capability when importing flash models ( #201 )
...
close #198
2023-04-19 12:52:37 +02:00
OlivierDehaene
e14ae3b5e9
feat(server): support quantization for flash models ( #200 )
...
closes #197
2023-04-19 12:51:11 +02:00
OlivierDehaene
2475aede61
feat(router): add info route ( #196 )
...
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
b927244eb5
feat(python-client): get list of currently deployed tgi models using the inference API ( #191 )
2023-04-17 18:43:24 +02:00
OlivierDehaene
c13b9d87c9
fix(router): fix truncation ( #190 )
...
closes #189
2023-04-17 16:51:53 +02:00
OlivierDehaene
7a1ba58557
fix(docker): fix docker image dependencies ( #187 )
2023-04-17 00:26:47 +02:00
OlivierDehaene
379c5c4da2
fix(docker): revert dockerfile changes ( #186 )
2023-04-14 19:30:30 +02:00
OlivierDehaene
f9047562d0
fix(docker): fix image ( #185 )
2023-04-14 18:58:38 +02:00
OlivierDehaene
1bb394631d
fix(docker): fix docker image ( #184 )
2023-04-14 17:31:13 +02:00
OlivierDehaene
01c0e368e5
fix(ci): fix cosign error ( #183 )
2023-04-14 12:35:26 +02:00
OlivierDehaene
53ee09c0b0
fea(dockerfile): better layer caching ( #159 )
2023-04-14 10:12:21 +02:00