Michael Feil
339ede9e90
Update Readme.md / documentation ( #15 )
...
* add documentation updates
* update readme
* Update README.md
2023-10-03 23:01:06 -07:00
Michael Feil
012c917b6f
Wrapping completions and chat/completions endpoint ( #2 )
...
* rebase and squash commits on latest main
* cargo fmt
* fix: 2038y problem
---------
Co-authored-by: michaelfeil <me@michaelfeil.eu>
2023-09-27 08:58:07 -07:00
Jason Sun
1e646fb41d
Compilation fix: Correct method argument types in generation.rs and validation.rs ( #10 )
...
* fix: Correct method argument types in generation and validation
In the `generation.rs` and `validation.rs` files, corrected the argument types
passed to the `decode` method. Replaced `Vec<u32>` with `&[u32]` using the
`as_ref()` method to match the expected argument types. This resolves the
mismatched types compilation error during the Rust build process.
Closes
[#9 ](https://github.com/Preemo-Inc/text-generation-inference/issues/9 )
* Update benchmark/src/generation.rs
Co-authored-by: Yang, Bo <pop.atry@gmail.com>
* Update router/src/validation.rs
Co-authored-by: Yang, Bo <pop.atry@gmail.com>
---------
Co-authored-by: Yang, Bo <pop.atry@gmail.com>
2023-08-23 13:52:49 -07:00
OlivierDehaene
afd04dc71e
feat(server): update vllm version ( #723 )
2023-07-28 15:36:38 +02:00
OlivierDehaene
73a4d65d26
feat: add cuda memory fraction ( #659 )
...
Close #673
2023-07-24 11:43:58 +02:00
OlivierDehaene
1da642bd0e
feat(server): add local prom and health routes if running w/ ngrok
2023-07-21 16:56:30 +02:00
OlivierDehaene
b66b190403
feat(router): ngrok edge ( #642 )
2023-07-19 11:59:58 +02:00
OlivierDehaene
fe80f5360c
feat(server): auto max_batch_total_tokens for flash att models ( #630 )
2023-07-19 09:31:25 +02:00
OlivierDehaene
982ce3227b
feat(router): explicit warning if revision is not set ( #608 )
2023-07-13 18:59:38 +02:00
OlivierDehaene
b7327205a6
feat(launcher): add arg validation and drop subprocess ( #595 )
2023-07-13 14:22:37 +02:00
OlivierDehaene
b4024edd45
feat: better errors for warmup and TP ( #575 )
...
Close #571
2023-07-10 14:47:15 +02:00
OlivierDehaene
6f42942772
feat(router): add argument for hostname in router ( #545 ) ( #550 )
...
# What does this PR do?
In title. Adds argument `--hostname` in router to support something like
`--hostname ::`. Tested with
```commandline
cargo run -- --port 8080 --hostname ::
curl -I -X GET 'http://[::1]:8080/health ' # failed before this commit
```
Trigger CI
---------
Co-authored-by: Phil Chen <philchen2000@gmail.com>
2023-07-05 18:28:45 +02:00
OlivierDehaene
e28a809004
v0.9.0 ( #525 )
2023-07-01 19:25:41 +02:00
OlivierDehaene
3b0c979efc
feat(router): arg validation ( #519 )
2023-06-30 20:07:49 +02:00
OlivierDehaene
e74bd41e0f
feat(server): add paged attention to flash models ( #516 )
...
Closes #478
2023-06-30 19:09:59 +02:00
Robert Kimball
70f485bf9f
feat(router): add header option to disable buffering for the generate_stream response ( #498 )
...
# This PR adds an http header option to disable buffering for the
generate_stream endpoint response stream.
Problem: If a model is run behind a proxy server such as nginx that has
buffering enabled then the response stream from generate_stream gets
aggregated into a single response which basically disables streaming.
Instead of getting a chunked response where each token is presented over
time the response presents everything all at once.
Solution: This change adds the `X-Accel-Buffering` http header which
disables buffering for the generate_stream response, allowing the
response to stream properly.
2023-06-28 11:50:12 +02:00
OlivierDehaene
bd3a9d8e85
fix(router): add timeout on flume sends ( #488 )
2023-06-23 14:58:28 +02:00
OlivierDehaene
f59fb8b630
feat(router): add ngrok integration ( #453 )
2023-06-16 16:25:11 +02:00
OlivierDehaene
19c41824cb
chore: update openapi schema
2023-06-05 18:16:08 +02:00
OlivierDehaene
895c5f1562
feat(server): only compute prefill logprobs when asked ( #406 )
...
Close #288
2023-06-02 17:12:30 +02:00
OlivierDehaene
218c9adaa5
feat: decrease IPC proto size ( #367 )
...
Closes #307 #308
2023-05-24 19:19:57 +02:00
OlivierDehaene
942005386a
feat(router): log input/ouput at debug level ( #364 )
...
@njhill FYI
2023-05-23 20:47:37 +02:00
OlivierDehaene
5a58226130
fix(server): fix decode token ( #334 )
...
Fixes #333
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-05-16 23:23:27 +02:00
OlivierDehaene
68e9d6ab33
feat(server): shard token decode ( #303 )
2023-05-10 15:48:21 +02:00
OlivierDehaene
e250282213
feat(docker): add benchmarking tool to docker image ( #298 )
2023-05-09 13:19:31 +02:00
Sai Vinay G
926fd9a010
feat(router): Adding response schema for compat_generate ( #292 )
2023-05-09 12:38:09 +02:00
Nicolas Patry
b4fe248b17
fix(launcher): handle hub branches ( #278 )
2023-05-04 15:14:28 +02:00
Nicolas Patry
411b0d4e1f
chore(github): add templates ( #264 )
2023-05-02 15:43:19 +02:00
Nicolas Patry
e86cca9723
Adding docs on how dynamic batching works. ( #258 )
...
This PR starts the minimal possible amount of explanation I could think
of. It tries to explain how dynamic batching occurs, the interactions
with past key values and ignores the padding problem.
Maybe some drawings could help too but I kept it to text for now.
2023-05-01 14:16:50 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests ( #248 )
2023-04-27 19:16:35 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue ( #244 )
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry
c4fb09f2ae
feat(router): add tests to validation ( #237 )
2023-04-26 16:14:40 +02:00
Nicolas Patry
45344244cf
Starting some routing tests. ( #233 )
2023-04-25 14:13:14 +02:00
OlivierDehaene
8b182eb986
feat(router): add endpoint info to /info route ( #228 )
2023-04-25 13:11:18 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ( #226 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
6ded76a4ae
v0.6.0 ( #222 )
2023-04-21 21:00:57 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info ( #215 )
2023-04-21 15:36:29 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel ( #202 )
2023-04-20 11:07:40 +02:00
OlivierDehaene
b6ee0ec7b0
feat(router): add git sha to info route ( #208 )
2023-04-19 21:36:59 +02:00
OlivierDehaene
252f42c1e6
fix(router): add auth token to get model info ( #207 )
2023-04-19 20:06:06 +02:00
OlivierDehaene
2475aede61
feat(router): add info route ( #196 )
...
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
c13b9d87c9
fix(router): fix truncation ( #190 )
...
closes #189
2023-04-17 16:51:53 +02:00
OlivierDehaene
64347b05ff
fix(ci): fix CVE in github-slug-action ( #174 )
2023-04-13 12:43:05 +02:00
OlivierDehaene
6f0f1d70f6
v0.5.0 ( #168 )
2023-04-11 20:32:18 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional ( #164 )
2023-04-09 20:22:27 +02:00
OlivierDehaene
7dec65a244
fix(router): use buckets for metrics histograms ( #163 )
2023-04-09 20:13:28 +02:00
OlivierDehaene
5cddc055e6
fix(rust-client): use join_all instead of select_all to hopefully fix nccl issues ( #162 )
2023-04-09 20:07:02 +02:00
OlivierDehaene
fef1a1c381
v0.4.3 ( #152 )
2023-03-30 17:28:14 +02:00
OlivierDehaene
84722f3e33
v0.4.2 ( #151 )
2023-03-30 17:10:01 +02:00
OlivierDehaene
610bb1f978
feat(benchmark): tui based benchmarking tool ( #149 )
2023-03-30 15:26:27 +02:00