Commit Graph

68 Commits

Author SHA1 Message Date
Michael Feil ff703cb867
Adding ctranslate2 quantization and inference: moving the contribution (#1)
* rebaseing the commit on preemo fork.

* reformatting and changes.

* update dockerfile

* update changes for dockerfile

* adapt path

* rebaseing the commit on preemo fork.

* reformatting and changes.

* update dockerfile

* update changes for dockerfile

* adapt path

---------

Co-authored-by: michaelfeil <me@michaelfeil.eu>
2023-10-02 11:12:49 -07:00
michaelfeil 44fa36b5bf restoring commit from dev branch, rebase on current master 2023-08-01 18:15:18 +02:00
OlivierDehaene 73a4d65d26
feat: add cuda memory fraction (#659)
Close #673
2023-07-24 11:43:58 +02:00
OlivierDehaene b66b190403
feat(router): ngrok edge (#642) 2023-07-19 11:59:58 +02:00
OlivierDehaene fe80f5360c
feat(server): auto max_batch_total_tokens for flash att models (#630) 2023-07-19 09:31:25 +02:00
OlivierDehaene 44acf72a73
fea(launcher): debug logs (#623) 2023-07-17 19:03:07 +02:00
Nicolas Patry bc2873246c
fix(launcher): Rename `b-float16` to `bfloat16` in the launcher arg (#621) 2023-07-17 18:38:16 +02:00
OlivierDehaene c58a0c185b
v0.9.2 (#616) 2023-07-14 16:31:48 +02:00
OlivierDehaene 982ce3227b
feat(router): explicit warning if revision is not set (#608) 2023-07-13 18:59:38 +02:00
OlivierDehaene b7327205a6
feat(launcher): add arg validation and drop subprocess (#595) 2023-07-13 14:22:37 +02:00
OlivierDehaene 6f42942772
feat(router): add argument for hostname in router (#545) (#550)
# What does this PR do?

In title. Adds argument `--hostname` in router to support something like
`--hostname ::`. Tested with

```commandline
cargo run -- --port 8080 --hostname ::
curl -I -X GET 'http://[::1]:8080/health'  # failed before this commit
```

Trigger CI

---------

Co-authored-by: Phil Chen <philchen2000@gmail.com>
2023-07-05 18:28:45 +02:00
OlivierDehaene e28a809004
v0.9.0 (#525) 2023-07-01 19:25:41 +02:00
OlivierDehaene 2b53d71991
fix(launcher): fix issue where launcher does not properly report shard failures (#522) 2023-06-30 23:09:20 +02:00
Nicolas Patry ecf6dc3a5a
feat: Add the option to force another dtype than `f16`. (#513) 2023-06-30 20:30:09 +02:00
OlivierDehaene 3b0c979efc
feat(router): arg validation (#519) 2023-06-30 20:07:49 +02:00
OlivierDehaene e74bd41e0f
feat(server): add paged attention to flash models (#516)
Closes #478
2023-06-30 19:09:59 +02:00
OlivierDehaene f59fb8b630
feat(router): add ngrok integration (#453) 2023-06-16 16:25:11 +02:00
A.J d4eb60f48d
docs(launcher): fix CUDA_VISIBLE_DEVICES helper comment (#441)
# What does this PR do?
It solves a typo in the comment sections referencing the environment
variable `CUDA_VISIBLE_DEVICES`. No misspelling references to this
variable have been found in code logic leading to undefined behaviour or
bugs. This PR is not expected to perform any code logic modification.
2023-06-12 13:59:22 +02:00
OlivierDehaene 83b84486ad
feat(launcher): parse oom signal (#404) 2023-06-02 14:17:27 +02:00
OlivierDehaene 95d3546976
feat(server): load santacoder/starcoder models with safetensors (#393)
Fix #366
2023-06-01 12:10:35 +02:00
OlivierDehaene 49a6c8c1b2 fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES 2023-05-30 13:27:48 +02:00
OlivierDehaene 146e72c3be fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES 2023-05-30 12:52:18 +02:00
OlivierDehaene e3e487dc71
feat(server): support trust_remote_code (#363) 2023-05-23 20:40:39 +02:00
Nicolas Patry 76a48cd365
feat(server): GPTQ quantization (step1) (#277)
Changes only the type from `bool` to `Option<Enum>` pretty much
everywhere.
- Use `Optional[str]` in Python (easier to manage than importing type
everywhere). Except for the cli to get proper validation
- Updated all models to handle gracefully new values. (Error out if
unknown value, or gptq since not implemented).

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-12 14:46:41 +02:00
Nicolas Patry e68509add7
feat(launcher): Improve error message when download process fails. (#276) 2023-05-04 15:29:29 +02:00
OlivierDehaene b67908e0cf
fix(launcher): pass weights cache override to the download process (#274)
closes #273
2023-05-03 23:39:35 +02:00
OlivierDehaene 85aa7e2e7b
feat(server): support hf endpoint weight layout (#266) 2023-05-03 11:36:24 +02:00
Nicolas Patry 411b0d4e1f
chore(github): add templates (#264) 2023-05-02 15:43:19 +02:00
Nicolas Patry b0b97fd9a7
doc(launcher): add more docs to the `launcher` itself and link in the README (#257) 2023-04-29 11:53:42 +02:00
Nicolas Patry db2b4e0754
feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry 77758f603b
chore(launcher): refactor logic (#242)
Hopefully it's cleaner
2023-04-26 14:43:36 +02:00
OlivierDehaene ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene 252f42c1e6
fix(router): add auth token to get model info (#207) 2023-04-19 20:06:06 +02:00
OlivierDehaene 2475aede61
feat(router): add info route (#196)
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene 7a1ba58557
fix(docker): fix docker image dependencies (#187) 2023-04-17 00:26:47 +02:00
OlivierDehaene e3a63b6fbc
fix(launcher): revert change on shard errors (#173) 2023-04-13 11:07:11 +02:00
OlivierDehaene f26dfd0dc1
feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene e63a21eb4d
feat(launcher): allow disabling hf_transfer (#161) 2023-04-09 20:00:05 +02:00
OlivierDehaene 55bd4fed7d
feat(router): add best_of parameter (#117) 2023-03-09 15:30:54 +01:00
OlivierDehaene 5fd2dcb513
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible (#108) 2023-03-08 13:53:41 +01:00
OlivierDehaene 0ac38d336a
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 11:06:59 +01:00
OlivierDehaene cd5961b5da
feat: allow local models (#101)
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene 240c4187fd
fix(launcher): add router parameters to launcher (#95) 2023-03-03 16:01:25 +01:00
OlivierDehaene 9b8ea6a6c7
feat(server): add logits watermark (#90) 2023-03-02 12:30:41 +01:00
OlivierDehaene 17bc841b1b
feat(server): enable hf-transfer (#76) 2023-02-18 14:04:11 +01:00
OlivierDehaene 6796d38c6d
feat(router): add cors allow origin options (#73) 2023-02-17 18:22:00 +01:00
OlivierDehaene 7b3d460d21
fix(launcher): copy current env vars to subprocesses (#70)
closes #69
2023-02-16 11:20:23 +01:00
OlivierDehaene 68455353f5
feat(launcher): add disable_custom_kernels arg (#67) 2023-02-15 16:23:45 +01:00
OlivierDehaene c5a4a1faf3
feat(server): improve download logging (#66) 2023-02-15 16:11:32 +01:00
OlivierDehaene 0fbc691946
feat: add safetensors conversion (#63) 2023-02-14 13:02:16 +01:00