OlivierDehaene
73a4d65d26
feat: add cuda memory fraction ( #659 )
...
Close #673
2023-07-24 11:43:58 +02:00
OlivierDehaene
1da642bd0e
feat(server): add local prom and health routes if running w/ ngrok
2023-07-21 16:56:30 +02:00
OlivierDehaene
b66b190403
feat(router): ngrok edge ( #642 )
2023-07-19 11:59:58 +02:00
OlivierDehaene
fe80f5360c
feat(server): auto max_batch_total_tokens for flash att models ( #630 )
2023-07-19 09:31:25 +02:00
OlivierDehaene
982ce3227b
feat(router): explicit warning if revision is not set ( #608 )
2023-07-13 18:59:38 +02:00
OlivierDehaene
b7327205a6
feat(launcher): add arg validation and drop subprocess ( #595 )
2023-07-13 14:22:37 +02:00
OlivierDehaene
b4024edd45
feat: better errors for warmup and TP ( #575 )
...
Close #571
2023-07-10 14:47:15 +02:00
OlivierDehaene
6f42942772
feat(router): add argument for hostname in router ( #545 ) ( #550 )
...
# What does this PR do?
In title. Adds argument `--hostname` in router to support something like
`--hostname ::`. Tested with
```commandline
cargo run -- --port 8080 --hostname ::
curl -I -X GET 'http://[::1]:8080/health ' # failed before this commit
```
Trigger CI
---------
Co-authored-by: Phil Chen <philchen2000@gmail.com>
2023-07-05 18:28:45 +02:00
OlivierDehaene
e28a809004
v0.9.0 ( #525 )
2023-07-01 19:25:41 +02:00
OlivierDehaene
3b0c979efc
feat(router): arg validation ( #519 )
2023-06-30 20:07:49 +02:00
OlivierDehaene
e74bd41e0f
feat(server): add paged attention to flash models ( #516 )
...
Closes #478
2023-06-30 19:09:59 +02:00
Robert Kimball
70f485bf9f
feat(router): add header option to disable buffering for the generate_stream response ( #498 )
...
# This PR adds an http header option to disable buffering for the
generate_stream endpoint response stream.
Problem: If a model is run behind a proxy server such as nginx that has
buffering enabled then the response stream from generate_stream gets
aggregated into a single response which basically disables streaming.
Instead of getting a chunked response where each token is presented over
time the response presents everything all at once.
Solution: This change adds the `X-Accel-Buffering` http header which
disables buffering for the generate_stream response, allowing the
response to stream properly.
2023-06-28 11:50:12 +02:00
OlivierDehaene
bd3a9d8e85
fix(router): add timeout on flume sends ( #488 )
2023-06-23 14:58:28 +02:00
OlivierDehaene
f59fb8b630
feat(router): add ngrok integration ( #453 )
2023-06-16 16:25:11 +02:00
OlivierDehaene
19c41824cb
chore: update openapi schema
2023-06-05 18:16:08 +02:00
OlivierDehaene
895c5f1562
feat(server): only compute prefill logprobs when asked ( #406 )
...
Close #288
2023-06-02 17:12:30 +02:00
OlivierDehaene
218c9adaa5
feat: decrease IPC proto size ( #367 )
...
Closes #307 #308
2023-05-24 19:19:57 +02:00
OlivierDehaene
942005386a
feat(router): log input/ouput at debug level ( #364 )
...
@njhill FYI
2023-05-23 20:47:37 +02:00
OlivierDehaene
5a58226130
fix(server): fix decode token ( #334 )
...
Fixes #333
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-05-16 23:23:27 +02:00
OlivierDehaene
68e9d6ab33
feat(server): shard token decode ( #303 )
2023-05-10 15:48:21 +02:00
OlivierDehaene
e250282213
feat(docker): add benchmarking tool to docker image ( #298 )
2023-05-09 13:19:31 +02:00
Sai Vinay G
926fd9a010
feat(router): Adding response schema for compat_generate ( #292 )
2023-05-09 12:38:09 +02:00
Nicolas Patry
b4fe248b17
fix(launcher): handle hub branches ( #278 )
2023-05-04 15:14:28 +02:00
Nicolas Patry
411b0d4e1f
chore(github): add templates ( #264 )
2023-05-02 15:43:19 +02:00
Nicolas Patry
e86cca9723
Adding docs on how dynamic batching works. ( #258 )
...
This PR starts the minimal possible amount of explanation I could think
of. It tries to explain how dynamic batching occurs, the interactions
with past key values and ignores the padding problem.
Maybe some drawings could help too but I kept it to text for now.
2023-05-01 14:16:50 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests ( #248 )
2023-04-27 19:16:35 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue ( #244 )
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry
c4fb09f2ae
feat(router): add tests to validation ( #237 )
2023-04-26 16:14:40 +02:00
Nicolas Patry
45344244cf
Starting some routing tests. ( #233 )
2023-04-25 14:13:14 +02:00
OlivierDehaene
8b182eb986
feat(router): add endpoint info to /info route ( #228 )
2023-04-25 13:11:18 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ( #226 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
6ded76a4ae
v0.6.0 ( #222 )
2023-04-21 21:00:57 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info ( #215 )
2023-04-21 15:36:29 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel ( #202 )
2023-04-20 11:07:40 +02:00
OlivierDehaene
b6ee0ec7b0
feat(router): add git sha to info route ( #208 )
2023-04-19 21:36:59 +02:00
OlivierDehaene
252f42c1e6
fix(router): add auth token to get model info ( #207 )
2023-04-19 20:06:06 +02:00
OlivierDehaene
2475aede61
feat(router): add info route ( #196 )
...
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
c13b9d87c9
fix(router): fix truncation ( #190 )
...
closes #189
2023-04-17 16:51:53 +02:00
OlivierDehaene
64347b05ff
fix(ci): fix CVE in github-slug-action ( #174 )
2023-04-13 12:43:05 +02:00
OlivierDehaene
6f0f1d70f6
v0.5.0 ( #168 )
2023-04-11 20:32:18 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional ( #164 )
2023-04-09 20:22:27 +02:00
OlivierDehaene
7dec65a244
fix(router): use buckets for metrics histograms ( #163 )
2023-04-09 20:13:28 +02:00
OlivierDehaene
5cddc055e6
fix(rust-client): use join_all instead of select_all to hopefully fix nccl issues ( #162 )
2023-04-09 20:07:02 +02:00
OlivierDehaene
fef1a1c381
v0.4.3 ( #152 )
2023-03-30 17:28:14 +02:00
OlivierDehaene
84722f3e33
v0.4.2 ( #151 )
2023-03-30 17:10:01 +02:00
OlivierDehaene
610bb1f978
feat(benchmark): tui based benchmarking tool ( #149 )
2023-03-30 15:26:27 +02:00
OlivierDehaene
d503e8f09d
feat: aws sagemaker compatible image ( #147 )
...
The only difference is that now it pushes to
registry.internal.huggingface.tech/api-inference/community/text-generation-inference/sagemaker:...
instead of
registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sagemaker-...
---------
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2023-03-29 21:38:30 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error ( #143 )
2023-03-28 11:29:35 +02:00
OlivierDehaene
ab5fd8cf93
v0.4.1 ( #140 )
2023-03-26 16:37:51 +02:00
OlivierDehaene
b49dbf2d88
fix(server): use server tokenizer as gt ( #128 )
2023-03-16 12:12:26 +01:00