Commit Graph

779 Commits

Author SHA1 Message Date
Nicolas Patry 76a48cd365
feat(server): GPTQ quantization (step1) (#277)
Changes only the type from `bool` to `Option<Enum>` pretty much
everywhere.
- Use `Optional[str]` in Python (easier to manage than importing type
everywhere). Except for the cli to get proper validation
- Updated all models to handle gracefully new values. (Error out if
unknown value, or gptq since not implemented).

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-12 14:46:41 +02:00
OlivierDehaene 4f6d038c0b fix(server): fix multinomial implem in Sampling 2023-05-11 13:30:38 +02:00
OlivierDehaene a6c18c39bb
feat(server): use cuda graph in logits warping (#302) 2023-05-10 19:08:54 +02:00
OlivierDehaene 35ab6cfcf1 fix(docker): remove CUDA_VERSION 2023-05-10 16:16:06 +02:00
OlivierDehaene 745f596c88
feat(server): use float16 (#304) 2023-05-10 15:51:10 +02:00
OlivierDehaene 68e9d6ab33
feat(server): shard token decode (#303) 2023-05-10 15:48:21 +02:00
OlivierDehaene 1585404464
fix(docker): remove nvidia require cuda env (#310) 2023-05-10 15:29:21 +02:00
OlivierDehaene 49cffad1bc
fix(docker): fix nvidia env vars (#305) 2023-05-09 19:02:52 +02:00
OlivierDehaene ad66f6ef9a
feat(server): optim flash causal lm decode_token (#285) 2023-05-09 18:26:19 +02:00
OlivierDehaene bc5c07231e
fix(docker): fix docker build (#299) 2023-05-09 14:39:59 +02:00
OlivierDehaene e250282213
feat(docker): add benchmarking tool to docker image (#298) 2023-05-09 13:19:31 +02:00
Sai Vinay G 926fd9a010
feat(router): Adding response schema for compat_generate (#292) 2023-05-09 12:38:09 +02:00
OlivierDehaene e9b01b3433
fix(dockerfile): fix nvidia env vars (#297)
Fixes #291
2023-05-09 12:36:02 +02:00
Nicolas Patry b4aa87db58
fea(server): decrease convert RAM requirements (#286) 2023-05-05 17:57:02 +02:00
Nicolas Patry 3314a46d36
chore: add `flash-attention` to docker ignore (#287)
included when building docker locally.
(Where the local dirs might have the flash-attention folder.)


# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-05 17:52:09 +02:00
Nicolas Patry 690fc31757
fix(server): fix convert (#284) 2023-05-05 15:28:08 +02:00
Nicolas Patry e68509add7
feat(launcher): Improve error message when download process fails. (#276) 2023-05-04 15:29:29 +02:00
Nicolas Patry f08343d44d
fix(server): Removes the parallelism in file convertion (during download) (#275) 2023-05-04 15:22:54 +02:00
Nicolas Patry b4fe248b17
fix(launcher): handle hub branches (#278) 2023-05-04 15:14:28 +02:00
OlivierDehaene b67908e0cf
fix(launcher): pass weights cache override to the download process (#274)
closes #273
2023-05-03 23:39:35 +02:00
OlivierDehaene 85aa7e2e7b
feat(server): support hf endpoint weight layout (#266) 2023-05-03 11:36:24 +02:00
OlivierDehaene 4096000e34
fix(server): fix typo in tokenizers decode (#269)
closes #268
2023-05-03 10:10:34 +02:00
Nicolas Patry 411b0d4e1f
chore(github): add templates (#264) 2023-05-02 15:43:19 +02:00
Nicolas Patry e86cca9723
Adding docs on how dynamic batching works. (#258)
This PR starts the minimal possible amount of explanation I could think
of. It tries to explain how dynamic batching occurs, the interactions
with past key values and ignores the padding problem.

Maybe some drawings could help too but I kept it to text for now.
2023-05-01 14:16:50 +02:00
OlivierDehaene 0e9d249b79
feat(benchmark): add support for private tokenizers (#262) 2023-04-29 12:17:30 +02:00
Nicolas Patry b0b97fd9a7
doc(launcher): add more docs to the `launcher` itself and link in the README (#257) 2023-04-29 11:53:42 +02:00
OlivierDehaene 593a563414
feat(docker): add nvidia env vars (#255) 2023-04-27 19:18:33 +02:00
Ehsan M. Kermani f092ba9b22
feat(server): add watermarking tests (#248) 2023-04-27 19:16:35 +02:00
OlivierDehaene b9ae7e5da1
chore(server): update transformers (#250) 2023-04-27 09:57:41 +02:00
Nick Hill 34bca0b8d3
fix(server): Small tidy of code from recent changes (#251)
remaining_decode_tokens was calculated twice in Seq2SeqLMBatch.filter()
2023-04-27 09:57:28 +02:00
Nick Hill b4cf832c40
fix(server): fix reshaping of bloom past_key_values in concatenate() (#252)
Introduced in #214 

Fixes #249
2023-04-27 09:51:27 +02:00
Nicolas Patry db2b4e0754
feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry c4fb09f2ae
feat(router): add tests to validation (#237) 2023-04-26 16:14:40 +02:00
Nicolas Patry 77758f603b
chore(launcher): refactor logic (#242)
Hopefully it's cleaner
2023-04-26 14:43:36 +02:00
OlivierDehaene 7de8a377b0 fix(benchmarking): fix benchmarking tool 2023-04-26 00:54:27 +02:00
Nicolas Patry 45344244cf
Starting some routing tests. (#233) 2023-04-25 14:13:14 +02:00
OlivierDehaene 323546df1d
fix(python-client): add auth headers to is supported requests (#234) 2023-04-25 13:55:26 +02:00
OlivierDehaene 37b64a5c10
chore(server): update safetensors version (#235) 2023-04-25 13:50:56 +02:00
OlivierDehaene 8b182eb986
feat(router): add endpoint info to /info route (#228) 2023-04-25 13:11:18 +02:00
OlivierDehaene ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene 98a3e0d135
chore(server): update huggingface-hub (#227) 2023-04-24 15:57:13 +02:00
Nick Hill 4a7dd4085a
feat(server): reduce memory requirement (#214) 2023-04-24 14:15:42 +02:00
OlivierDehaene 6ded76a4ae
v0.6.0 (#222) 2023-04-21 21:00:57 +02:00
OlivierDehaene 97df0c7bc0
misc: update to rust 1.69 (#221) 2023-04-21 21:00:30 +02:00
OlivierDehaene 4b460e72fb
fix(server): fix flash batch filtering (#220) 2023-04-21 20:26:01 +02:00
OlivierDehaene 1ffea36ec2
fix(server): fix flash causal (#219) 2023-04-21 19:49:08 +02:00
OlivierDehaene 86bca365df
fix(server): fix flash causal (#218) 2023-04-21 19:42:16 +02:00
OlivierDehaene afc5b999d0
fix(server): cleanup new flash past_key_values logic (#217) 2023-04-21 16:19:04 +02:00
OlivierDehaene db4cb5e4ed
fix(server): fix past key values logic (#216)
@njhill fyi
2023-04-21 15:59:18 +02:00
OlivierDehaene 343437c7b5
feat(router): add device and dtype info (#215) 2023-04-21 15:36:29 +02:00