Commit Graph

98 Commits

Author SHA1 Message Date
OlivierDehaene 19c41824cb chore: update openapi schema 2023-06-05 18:16:08 +02:00
OlivierDehaene 895c5f1562
feat(server): only compute prefill logprobs when asked (#406)
Close #288
2023-06-02 17:12:30 +02:00
OlivierDehaene 218c9adaa5
feat: decrease IPC proto size (#367)
Closes #307 #308
2023-05-24 19:19:57 +02:00
OlivierDehaene 942005386a
feat(router): log input/ouput at debug level (#364)
@njhill FYI
2023-05-23 20:47:37 +02:00
OlivierDehaene 5a58226130
fix(server): fix decode token (#334)
Fixes #333

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-05-16 23:23:27 +02:00
OlivierDehaene 68e9d6ab33
feat(server): shard token decode (#303) 2023-05-10 15:48:21 +02:00
OlivierDehaene e250282213
feat(docker): add benchmarking tool to docker image (#298) 2023-05-09 13:19:31 +02:00
Sai Vinay G 926fd9a010
feat(router): Adding response schema for compat_generate (#292) 2023-05-09 12:38:09 +02:00
Nicolas Patry b4fe248b17
fix(launcher): handle hub branches (#278) 2023-05-04 15:14:28 +02:00
Nicolas Patry 411b0d4e1f
chore(github): add templates (#264) 2023-05-02 15:43:19 +02:00
Nicolas Patry e86cca9723
Adding docs on how dynamic batching works. (#258)
This PR starts the minimal possible amount of explanation I could think
of. It tries to explain how dynamic batching occurs, the interactions
with past key values and ignores the padding problem.

Maybe some drawings could help too but I kept it to text for now.
2023-05-01 14:16:50 +02:00
Ehsan M. Kermani f092ba9b22
feat(server): add watermarking tests (#248) 2023-04-27 19:16:35 +02:00
Nicolas Patry db2b4e0754
feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry c4fb09f2ae
feat(router): add tests to validation (#237) 2023-04-26 16:14:40 +02:00
Nicolas Patry 45344244cf
Starting some routing tests. (#233) 2023-04-25 14:13:14 +02:00
OlivierDehaene 8b182eb986
feat(router): add endpoint info to /info route (#228) 2023-04-25 13:11:18 +02:00
OlivierDehaene ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene 6ded76a4ae
v0.6.0 (#222) 2023-04-21 21:00:57 +02:00
OlivierDehaene 343437c7b5
feat(router): add device and dtype info (#215) 2023-04-21 15:36:29 +02:00
OlivierDehaene 709d8936f6
feat(router): drop requests when client closes the channel (#202) 2023-04-20 11:07:40 +02:00
OlivierDehaene b6ee0ec7b0
feat(router): add git sha to info route (#208) 2023-04-19 21:36:59 +02:00
OlivierDehaene 252f42c1e6
fix(router): add auth token to get model info (#207) 2023-04-19 20:06:06 +02:00
OlivierDehaene 2475aede61
feat(router): add info route (#196)
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene c13b9d87c9
fix(router): fix truncation (#190)
closes #189
2023-04-17 16:51:53 +02:00
OlivierDehaene 64347b05ff
fix(ci): fix CVE in github-slug-action (#174) 2023-04-13 12:43:05 +02:00
OlivierDehaene 6f0f1d70f6
v0.5.0 (#168) 2023-04-11 20:32:18 +02:00
OlivierDehaene 9987960062
feat(router): make router input validation optional (#164) 2023-04-09 20:22:27 +02:00
OlivierDehaene 7dec65a244
fix(router): use buckets for metrics histograms (#163) 2023-04-09 20:13:28 +02:00
OlivierDehaene 5cddc055e6
fix(rust-client): use join_all instead of select_all to hopefully fix nccl issues (#162) 2023-04-09 20:07:02 +02:00
OlivierDehaene fef1a1c381
v0.4.3 (#152) 2023-03-30 17:28:14 +02:00
OlivierDehaene 84722f3e33
v0.4.2 (#151) 2023-03-30 17:10:01 +02:00
OlivierDehaene 610bb1f978
feat(benchmark): tui based benchmarking tool (#149) 2023-03-30 15:26:27 +02:00
OlivierDehaene d503e8f09d
feat: aws sagemaker compatible image (#147)
The only difference is that now it pushes to
registry.internal.huggingface.tech/api-inference/community/text-generation-inference/sagemaker:...
instead of
registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sagemaker-...

---------

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2023-03-29 21:38:30 +02:00
OlivierDehaene f000068944
feat(server): clear cache on error (#143) 2023-03-28 11:29:35 +02:00
OlivierDehaene ab5fd8cf93
v0.4.1 (#140) 2023-03-26 16:37:51 +02:00
OlivierDehaene b49dbf2d88
fix(server): use server tokenizer as gt (#128) 2023-03-16 12:12:26 +01:00
OlivierDehaene cbd36aa4d1
fix(server): revert gpt-neox optims (#123) 2023-03-13 22:57:08 +01:00
OlivierDehaene 411d6247f4
v0.4.0 (#119) 2023-03-09 16:07:01 +01:00
OlivierDehaene 55bd4fed7d
feat(router): add best_of parameter (#117) 2023-03-09 15:30:54 +01:00
OlivierDehaene e8bfe199ba
feat(router): support left truncation (#115)
closes #111
2023-03-09 13:10:30 +01:00
OlivierDehaene 1a2d68250a
feat: support typical sampling (#114)
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene 3fef90d50f
feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00
OlivierDehaene cd5961b5da
feat: allow local models (#101)
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene 1c19b0934e
v0.3.2 (#97) 2023-03-03 18:42:20 +01:00
OlivierDehaene 9b8ea6a6c7
feat(server): add logits watermark (#90) 2023-03-02 12:30:41 +01:00
OlivierDehaene f874c47831
feat(router): add api-inference headers (#91) 2023-03-02 11:41:51 +01:00
OlivierDehaene 4e685d907e
feat(router): ask hf.co for pipelinetag to decide on compat_return_full_text (#89) 2023-02-28 10:19:32 +01:00
OlivierDehaene 21340f24ba
feat(router): add legacy route for api-inference support (#88) 2023-02-27 14:56:58 +01:00
OlivierDehaene 0ac184ce77
feat(server): add special token bool (#85) 2023-02-24 15:55:57 +01:00
OlivierDehaene 4b1c9720c0
v0.3.1 (#84) 2023-02-24 13:27:41 +01:00