OlivierDehaene
|
a963495315
|
add logic to queue
|
2023-04-26 13:40:20 +02:00 |
OlivierDehaene
|
4f460e5bfe
|
feat(server): improve max tokens calculation
|
2023-04-26 13:07:25 +02:00 |
OlivierDehaene
|
7de8a377b0
|
fix(benchmarking): fix benchmarking tool
|
2023-04-26 00:54:27 +02:00 |
Nicolas Patry
|
45344244cf
|
Starting some routing tests. (#233)
|
2023-04-25 14:13:14 +02:00 |
OlivierDehaene
|
323546df1d
|
fix(python-client): add auth headers to is supported requests (#234)
|
2023-04-25 13:55:26 +02:00 |
OlivierDehaene
|
37b64a5c10
|
chore(server): update safetensors version (#235)
|
2023-04-25 13:50:56 +02:00 |
OlivierDehaene
|
8b182eb986
|
feat(router): add endpoint info to /info route (#228)
|
2023-04-25 13:11:18 +02:00 |
OlivierDehaene
|
ebc74d5666
|
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2023-04-24 17:59:00 +02:00 |
OlivierDehaene
|
98a3e0d135
|
chore(server): update huggingface-hub (#227)
|
2023-04-24 15:57:13 +02:00 |
Nick Hill
|
4a7dd4085a
|
feat(server): reduce memory requirement (#214)
|
2023-04-24 14:15:42 +02:00 |
OlivierDehaene
|
6ded76a4ae
|
v0.6.0 (#222)
|
2023-04-21 21:00:57 +02:00 |
OlivierDehaene
|
97df0c7bc0
|
misc: update to rust 1.69 (#221)
|
2023-04-21 21:00:30 +02:00 |
OlivierDehaene
|
4b460e72fb
|
fix(server): fix flash batch filtering (#220)
|
2023-04-21 20:26:01 +02:00 |
OlivierDehaene
|
1ffea36ec2
|
fix(server): fix flash causal (#219)
|
2023-04-21 19:49:08 +02:00 |
OlivierDehaene
|
86bca365df
|
fix(server): fix flash causal (#218)
|
2023-04-21 19:42:16 +02:00 |
OlivierDehaene
|
afc5b999d0
|
fix(server): cleanup new flash past_key_values logic (#217)
|
2023-04-21 16:19:04 +02:00 |
OlivierDehaene
|
db4cb5e4ed
|
fix(server): fix past key values logic (#216)
@njhill fyi
|
2023-04-21 15:59:18 +02:00 |
OlivierDehaene
|
343437c7b5
|
feat(router): add device and dtype info (#215)
|
2023-04-21 15:36:29 +02:00 |
Nick Hill
|
ac8c0f6fe4
|
feat(server): flash attention past key value optimizations (#213)
|
2023-04-21 14:57:18 +02:00 |
OlivierDehaene
|
274513e6a3
|
fix(ci): fix sha in docker image (#212)
|
2023-04-20 18:50:47 +02:00 |
OlivierDehaene
|
709d8936f6
|
feat(router): drop requests when client closes the channel (#202)
|
2023-04-20 11:07:40 +02:00 |
OlivierDehaene
|
b6ee0ec7b0
|
feat(router): add git sha to info route (#208)
|
2023-04-19 21:36:59 +02:00 |
OlivierDehaene
|
252f42c1e6
|
fix(router): add auth token to get model info (#207)
|
2023-04-19 20:06:06 +02:00 |
OlivierDehaene
|
6837b2eb77
|
fix(docker): remove unused dependencies (#205)
|
2023-04-19 19:39:31 +02:00 |
OlivierDehaene
|
5d27f5259b
|
fix(server): fix hf_transfer issue with private repos (#203)
|
2023-04-19 17:36:16 +02:00 |
OlivierDehaene
|
a88c54bb4c
|
feat(server): check cuda capability when importing flash models (#201)
close #198
|
2023-04-19 12:52:37 +02:00 |
OlivierDehaene
|
e14ae3b5e9
|
feat(server): support quantization for flash models (#200)
closes #197
|
2023-04-19 12:51:11 +02:00 |
OlivierDehaene
|
2475aede61
|
feat(router): add info route (#196)
close #125
|
2023-04-18 16:16:06 +02:00 |
OlivierDehaene
|
b927244eb5
|
feat(python-client): get list of currently deployed tgi models using the inference API (#191)
|
2023-04-17 18:43:24 +02:00 |
OlivierDehaene
|
c13b9d87c9
|
fix(router): fix truncation (#190)
closes #189
|
2023-04-17 16:51:53 +02:00 |
OlivierDehaene
|
7a1ba58557
|
fix(docker): fix docker image dependencies (#187)
|
2023-04-17 00:26:47 +02:00 |
OlivierDehaene
|
379c5c4da2
|
fix(docker): revert dockerfile changes (#186)
|
2023-04-14 19:30:30 +02:00 |
OlivierDehaene
|
f9047562d0
|
fix(docker): fix image (#185)
|
2023-04-14 18:58:38 +02:00 |
OlivierDehaene
|
1bb394631d
|
fix(docker): fix docker image (#184)
|
2023-04-14 17:31:13 +02:00 |
OlivierDehaene
|
01c0e368e5
|
fix(ci): fix cosign error (#183)
|
2023-04-14 12:35:26 +02:00 |
OlivierDehaene
|
53ee09c0b0
|
fea(dockerfile): better layer caching (#159)
|
2023-04-14 10:12:21 +02:00 |
OlivierDehaene
|
12e5633c4d
|
fix(ci): fix ci permissions (#181)
|
2023-04-13 16:32:37 +02:00 |
OlivierDehaene
|
c1e2ea3b78
|
feat(ci): faster scanning (#180)
|
2023-04-13 16:23:47 +02:00 |
OlivierDehaene
|
13f1cd024b
|
feat(ci): use large runners (#179)
|
2023-04-13 16:11:48 +02:00 |
OlivierDehaene
|
9683c37bd3
|
feat(ci): add Trivy and scan docker image (#178)
|
2023-04-13 15:43:17 +02:00 |
OlivierDehaene
|
643a39d556
|
feat(ci): add image signing with cosign (#175)
|
2023-04-13 15:26:34 +02:00 |
OlivierDehaene
|
64347b05ff
|
fix(ci): fix CVE in github-slug-action (#174)
|
2023-04-13 12:43:05 +02:00 |
OlivierDehaene
|
e3a63b6fbc
|
fix(launcher): revert change on shard errors (#173)
|
2023-04-13 11:07:11 +02:00 |
OlivierDehaene
|
880a76eed5
|
feat(server): support sharded santacoder (#167)
|
2023-04-12 17:18:08 +02:00 |
OlivierDehaene
|
5fa8ae041c
|
feat(server): optimize decode for sane tokenizers (#170)
|
2023-04-12 12:03:10 +02:00 |
OlivierDehaene
|
6f0f1d70f6
|
v0.5.0 (#168)
|
2023-04-11 20:32:18 +02:00 |
OlivierDehaene
|
f26dfd0dc1
|
feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
|
2023-04-11 19:16:41 +02:00 |
OlivierDehaene
|
299217c95c
|
feat(server): add flash attention llama (#144)
|
2023-04-11 16:38:22 +02:00 |
OlivierDehaene
|
9987960062
|
feat(router): make router input validation optional (#164)
|
2023-04-09 20:22:27 +02:00 |
OlivierDehaene
|
7dec65a244
|
fix(router): use buckets for metrics histograms (#163)
|
2023-04-09 20:13:28 +02:00 |