Nicolas Patry
bc2873246c
fix(launcher): Rename `b-float16` to `bfloat16` in the launcher arg ( #621 )
2023-07-17 18:38:16 +02:00
OlivierDehaene
c58a0c185b
v0.9.2 ( #616 )
2023-07-14 16:31:48 +02:00
OlivierDehaene
982ce3227b
feat(router): explicit warning if revision is not set ( #608 )
2023-07-13 18:59:38 +02:00
OlivierDehaene
b7327205a6
feat(launcher): add arg validation and drop subprocess ( #595 )
2023-07-13 14:22:37 +02:00
OlivierDehaene
6f42942772
feat(router): add argument for hostname in router ( #545 ) ( #550 )
...
# What does this PR do?
In title. Adds argument `--hostname` in router to support something like
`--hostname ::`. Tested with
```commandline
cargo run -- --port 8080 --hostname ::
curl -I -X GET 'http://[::1]:8080/health ' # failed before this commit
```
Trigger CI
---------
Co-authored-by: Phil Chen <philchen2000@gmail.com>
2023-07-05 18:28:45 +02:00
OlivierDehaene
e28a809004
v0.9.0 ( #525 )
2023-07-01 19:25:41 +02:00
OlivierDehaene
2b53d71991
fix(launcher): fix issue where launcher does not properly report shard failures ( #522 )
2023-06-30 23:09:20 +02:00
Nicolas Patry
ecf6dc3a5a
feat: Add the option to force another dtype than `f16`. ( #513 )
2023-06-30 20:30:09 +02:00
OlivierDehaene
3b0c979efc
feat(router): arg validation ( #519 )
2023-06-30 20:07:49 +02:00
OlivierDehaene
e74bd41e0f
feat(server): add paged attention to flash models ( #516 )
...
Closes #478
2023-06-30 19:09:59 +02:00
OlivierDehaene
f59fb8b630
feat(router): add ngrok integration ( #453 )
2023-06-16 16:25:11 +02:00
A.J
d4eb60f48d
docs(launcher): fix CUDA_VISIBLE_DEVICES helper comment ( #441 )
...
# What does this PR do?
It solves a typo in the comment sections referencing the environment
variable `CUDA_VISIBLE_DEVICES`. No misspelling references to this
variable have been found in code logic leading to undefined behaviour or
bugs. This PR is not expected to perform any code logic modification.
2023-06-12 13:59:22 +02:00
OlivierDehaene
83b84486ad
feat(launcher): parse oom signal ( #404 )
2023-06-02 14:17:27 +02:00
OlivierDehaene
95d3546976
feat(server): load santacoder/starcoder models with safetensors ( #393 )
...
Fix #366
2023-06-01 12:10:35 +02:00
OlivierDehaene
49a6c8c1b2
fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES
2023-05-30 13:27:48 +02:00
OlivierDehaene
146e72c3be
fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES
2023-05-30 12:52:18 +02:00
OlivierDehaene
e3e487dc71
feat(server): support trust_remote_code ( #363 )
2023-05-23 20:40:39 +02:00
OlivierDehaene
e71471bec9
feat: add snapshot testing ( #282 )
2023-05-15 23:36:30 +02:00
Nicolas Patry
76a48cd365
feat(server): GPTQ quantization (step1) ( #277 )
...
Changes only the type from `bool` to `Option<Enum>` pretty much
everywhere.
- Use `Optional[str]` in Python (easier to manage than importing type
everywhere). Except for the cli to get proper validation
- Updated all models to handle gracefully new values. (Error out if
unknown value, or gptq since not implemented).
<!--
Congratulations! You've made it this far! You're not quite done yet
though.
Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.
Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.
Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests ),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/ )? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs ),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation ).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @
@OlivierDehaene OR @Narsil
-->
2023-05-12 14:46:41 +02:00
OlivierDehaene
e250282213
feat(docker): add benchmarking tool to docker image ( #298 )
2023-05-09 13:19:31 +02:00
Nicolas Patry
e68509add7
feat(launcher): Improve error message when download process fails. ( #276 )
2023-05-04 15:29:29 +02:00
OlivierDehaene
b67908e0cf
fix(launcher): pass weights cache override to the download process ( #274 )
...
closes #273
2023-05-03 23:39:35 +02:00
OlivierDehaene
85aa7e2e7b
feat(server): support hf endpoint weight layout ( #266 )
2023-05-03 11:36:24 +02:00
Nicolas Patry
411b0d4e1f
chore(github): add templates ( #264 )
2023-05-02 15:43:19 +02:00
Nicolas Patry
b0b97fd9a7
doc(launcher): add more docs to the `launcher` itself and link in the README ( #257 )
2023-04-29 11:53:42 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue ( #244 )
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry
77758f603b
chore(launcher): refactor logic ( #242 )
...
Hopefully it's cleaner
2023-04-26 14:43:36 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ( #226 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
6ded76a4ae
v0.6.0 ( #222 )
2023-04-21 21:00:57 +02:00
OlivierDehaene
252f42c1e6
fix(router): add auth token to get model info ( #207 )
2023-04-19 20:06:06 +02:00
OlivierDehaene
2475aede61
feat(router): add info route ( #196 )
...
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
7a1ba58557
fix(docker): fix docker image dependencies ( #187 )
2023-04-17 00:26:47 +02:00
OlivierDehaene
e3a63b6fbc
fix(launcher): revert change on shard errors ( #173 )
2023-04-13 11:07:11 +02:00
OlivierDehaene
6f0f1d70f6
v0.5.0 ( #168 )
2023-04-11 20:32:18 +02:00
OlivierDehaene
f26dfd0dc1
feat(server): support OPT models ( #55 )
...
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama ( #144 )
2023-04-11 16:38:22 +02:00
OlivierDehaene
e63a21eb4d
feat(launcher): allow disabling hf_transfer ( #161 )
2023-04-09 20:00:05 +02:00
OlivierDehaene
fef1a1c381
v0.4.3 ( #152 )
2023-03-30 17:28:14 +02:00
OlivierDehaene
84722f3e33
v0.4.2 ( #151 )
2023-03-30 17:10:01 +02:00
OlivierDehaene
ab5fd8cf93
v0.4.1 ( #140 )
2023-03-26 16:37:51 +02:00
OlivierDehaene
411d6247f4
v0.4.0 ( #119 )
2023-03-09 16:07:01 +01:00
OlivierDehaene
55bd4fed7d
feat(router): add best_of parameter ( #117 )
2023-03-09 15:30:54 +01:00
OlivierDehaene
5fd2dcb513
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible ( #108 )
2023-03-08 13:53:41 +01:00
OlivierDehaene
0ac38d336a
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES ( #107 )
2023-03-08 11:06:59 +01:00
OlivierDehaene
cd5961b5da
feat: allow local models ( #101 )
...
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene
9b205d33cc
fix(server): fix generate_stream by forcing tokens to be decoded correctly ( #100 )
2023-03-06 13:22:58 +01:00
OlivierDehaene
1c19b0934e
v0.3.2 ( #97 )
2023-03-03 18:42:20 +01:00
OlivierDehaene
240c4187fd
fix(launcher): add router parameters to launcher ( #95 )
2023-03-03 16:01:25 +01:00
OlivierDehaene
9b8ea6a6c7
feat(server): add logits watermark ( #90 )
2023-03-02 12:30:41 +01:00
OlivierDehaene
0ac184ce77
feat(server): add special token bool ( #85 )
2023-02-24 15:55:57 +01:00
OlivierDehaene
4b1c9720c0
v0.3.1 ( #84 )
2023-02-24 13:27:41 +01:00
OlivierDehaene
17bc841b1b
feat(server): enable hf-transfer ( #76 )
2023-02-18 14:04:11 +01:00
OlivierDehaene
6796d38c6d
feat(router): add cors allow origin options ( #73 )
2023-02-17 18:22:00 +01:00
OlivierDehaene
c720555adc
v0.3.0 ( #72 )
2023-02-16 17:28:29 +01:00
OlivierDehaene
7b3d460d21
fix(launcher): copy current env vars to subprocesses ( #70 )
...
closes #69
2023-02-16 11:20:23 +01:00
OlivierDehaene
68455353f5
feat(launcher): add disable_custom_kernels arg ( #67 )
2023-02-15 16:23:45 +01:00
OlivierDehaene
c5a4a1faf3
feat(server): improve download logging ( #66 )
2023-02-15 16:11:32 +01:00
OlivierDehaene
0fbc691946
feat: add safetensors conversion ( #63 )
2023-02-14 13:02:16 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
OlivierDehaene
1ad3250b89
fix(docker): increase shm size ( #60 )
2023-02-08 17:53:33 +01:00
OlivierDehaene
2fe5e1b30e
V0.2.1 ( #58 )
2023-02-07 15:40:25 +01:00
OlivierDehaene
4acc42a605
fix(server): better handling of inference mode ( #57 )
2023-02-07 15:38:22 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas ( #53 )
2023-02-03 12:43:37 +01:00
OlivierDehaene
b1482d9048
breaking(router): modify /generate API to only return generated text ( #50 )
...
@njhill, @yk FYI
generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.
We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene
7b870e1e18
feat(router): use background task to manage request queue ( #52 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-02-02 14:59:27 +01:00
OlivierDehaene
775115e3a5
feat(server): allow the server to use a local weight cache ( #49 )
2023-02-01 16:22:10 +01:00
OlivierDehaene
f830706b21
feat(server): Support GPT-Neox ( #39 )
2023-01-31 18:53:56 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support ( #41 )
2023-01-31 17:04:00 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" ( #40 )
...
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support ( #36 )
...
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder ( #26 )
2023-01-20 12:24:39 +01:00
Nick Hill
e6d3eb5d5d
fix(server): Minor refactorization using new_zeros ( #24 )
...
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene
fcc2c5fcbf
feat(launcher): Log server stdout ( #19 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-01-05 12:01:23 +01:00
OlivierDehaene
611e21cb13
fix(server): Fix stop sequences ( #11 )
2022-12-16 16:03:39 +01:00
OlivierDehaene
3e2e6240b8
feat(launcher): Add integration tests ( #9 )
2022-12-16 11:29:36 +01:00
OlivierDehaene
4236e41b0d
feat(server): Improved doc
2022-11-07 12:53:56 +01:00
OlivierDehaene
cea6051eff
feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard
2022-11-04 18:31:08 +01:00
OlivierDehaene
b3b7ea0d74
feat: Use json formatter by default in docker image
2022-11-02 17:29:56 +01:00
OlivierDehaene
3cf6368c77
feat(server): Support all AutoModelForCausalLM on a best effort basis
2022-10-28 19:24:00 +02:00
OlivierDehaene
09674e6df9
feat(server): Support bitsandbytes
2022-10-27 14:25:29 +02:00
Nicolas Patry
c8ce9b2515
feat(server): Use safetensors
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2022-10-22 20:00:15 +02:00
OlivierDehaene
c837893370
feat(router): Add max_waiting_tokens
2022-10-21 16:40:05 +02:00
Olivier Dehaene
f16f2f5ae1
v0.1.0
2022-10-20 19:14:44 +02:00