OlivierDehaene
e3a63b6fbc
fix(launcher): revert change on shard errors ( #173 )
2023-04-13 11:07:11 +02:00
OlivierDehaene
880a76eed5
feat(server): support sharded santacoder ( #167 )
2023-04-12 17:18:08 +02:00
OlivierDehaene
5fa8ae041c
feat(server): optimize decode for sane tokenizers ( #170 )
2023-04-12 12:03:10 +02:00
OlivierDehaene
6f0f1d70f6
v0.5.0 ( #168 )
2023-04-11 20:32:18 +02:00
OlivierDehaene
f26dfd0dc1
feat(server): support OPT models ( #55 )
...
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama ( #144 )
2023-04-11 16:38:22 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional ( #164 )
2023-04-09 20:22:27 +02:00
OlivierDehaene
7dec65a244
fix(router): use buckets for metrics histograms ( #163 )
2023-04-09 20:13:28 +02:00
OlivierDehaene
5cddc055e6
fix(rust-client): use join_all instead of select_all to hopefully fix nccl issues ( #162 )
2023-04-09 20:07:02 +02:00
OlivierDehaene
e63a21eb4d
feat(launcher): allow disabling hf_transfer ( #161 )
2023-04-09 20:00:05 +02:00
OlivierDehaene
1883d8ecde
feat(docker): improve flash_attention caching ( #160 )
2023-04-09 19:59:16 +02:00
OlivierDehaene
3f2542bb6a
fix(server): fix escape characters in stop sequence ( #155 )
2023-04-05 19:37:41 +02:00
Guspan Tanadi
9122e7bd9c
docs(readme): provide link Logits Warper README ( #154 )
2023-04-04 13:27:46 +02:00
OlivierDehaene
c0aeb32583
feat(server): flash santacoder ( #153 )
2023-04-03 19:06:42 +02:00
OlivierDehaene
fef1a1c381
v0.4.3 ( #152 )
2023-03-30 17:28:14 +02:00
OlivierDehaene
84722f3e33
v0.4.2 ( #151 )
2023-03-30 17:10:01 +02:00
OlivierDehaene
08b7e4a282
fix(server): fix flash neox rotary embeddings ( #150 )
2023-03-30 16:12:23 +02:00
OlivierDehaene
610bb1f978
feat(benchmark): tui based benchmarking tool ( #149 )
2023-03-30 15:26:27 +02:00
OlivierDehaene
55106ec476
fix(ci): fix sagemaker action ( #148 )
2023-03-29 22:27:01 +02:00
OlivierDehaene
d503e8f09d
feat: aws sagemaker compatible image ( #147 )
...
The only difference is that now it pushes to
registry.internal.huggingface.tech/api-inference/community/text-generation-inference/sagemaker:...
instead of
registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sagemaker-...
---------
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2023-03-29 21:38:30 +02:00
OlivierDehaene
c9bdaa8b73
feat(server): reduce mlp and attn in one op for flash neox ( #145 )
2023-03-28 16:51:41 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error ( #143 )
2023-03-28 11:29:35 +02:00
Nick Hill
8e8dd984d8
feat(server): Add mypy-protobuf ( #141 )
...
Generates .pyi files for protobuf stubs which provide strong typing
information. Very helpful for IDE auto-completion, etc.
2023-03-27 09:25:15 +02:00
Nick Hill
462530c2b0
fix(server): Avoid using try/except to determine kind of AutoModel ( #142 )
2023-03-27 09:23:22 +02:00
OlivierDehaene
ab5fd8cf93
v0.4.1 ( #140 )
2023-03-26 16:37:51 +02:00
OlivierDehaene
678b2f3900
feat(server): cleanup flash neox loading ( #139 )
2023-03-26 16:37:21 +02:00
OlivierDehaene
d6a93fe992
fix(server): fix flash-neox scores warping ( #137 )
2023-03-24 18:21:41 +01:00
OlivierDehaene
05e9a796cc
feat(server): flash neoX ( #133 )
2023-03-24 14:02:14 +01:00
OlivierDehaene
23e1028822
feat(python-client): add CI ( #136 )
2023-03-23 18:13:04 +01:00
OlivierDehaene
5d04525cb9
feat(python-client): release v0.4.0 ( #135 )
2023-03-23 18:07:20 +01:00
lewtun
5e5e9d4bbd
feat: Add note about NVIDIA drivers ( #64 )
...
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-03-23 18:03:45 +01:00
OlivierDehaene
603e20b5f7
feat(ci): add ci paths ( #134 )
2023-03-23 18:01:30 +01:00
dconathan
7850119055
feat(python-client): add cookies to Client constructors and requests ( #132 )
...
I have a use case where we need to pass cookies (for auth reasons) to an
internally hosted server.
Note: I couldn't get the client tests to pass - do you need to have an
HF token?
```python
FAILED tests/test_client.py::test_generate - text_generation.errors.BadRequestError: Authorization header is correct, but the token seems invalid
```
2023-03-23 18:01:01 +01:00
OlivierDehaene
a3b7db932f
fix(python-client): relax dependencies ( #129 )
2023-03-16 12:57:07 +01:00
OlivierDehaene
b49dbf2d88
fix(server): use server tokenizer as gt ( #128 )
2023-03-16 12:12:26 +01:00
OlivierDehaene
8ad60b752f
fix(server): add position ids to neox ( #126 )
2023-03-15 13:12:49 +01:00
OlivierDehaene
cbd36aa4d1
fix(server): revert gpt-neox optims ( #123 )
2023-03-13 22:57:08 +01:00
OlivierDehaene
6860ce9c67
feat: add OpenAssistant/oasst-sft-1-pythia-12b to the list of supported models ( #122 )
...
…ed models
2023-03-13 20:42:10 +01:00
OlivierDehaene
411d6247f4
v0.4.0 ( #119 )
2023-03-09 16:07:01 +01:00
OlivierDehaene
d8dc8f1b0c
feat(python-client): add new parameters ( #118 )
2023-03-09 16:05:33 +01:00
OlivierDehaene
55bd4fed7d
feat(router): add best_of parameter ( #117 )
2023-03-09 15:30:54 +01:00
OlivierDehaene
e8bfe199ba
feat(router): support left truncation ( #115 )
...
closes #111
2023-03-09 13:10:30 +01:00
OlivierDehaene
c0795de2f2
fix(server): do not warp prefill logits ( #116 )
2023-03-09 13:00:10 +01:00
OlivierDehaene
1a2d68250a
feat: support typical sampling ( #114 )
...
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene
941cd42e0c
fix(server): fix index out of range for watermarking ( #110 )
2023-03-08 18:29:08 +01:00
OlivierDehaene
2c5df5d2af
fix(python-client): stream not set on the sync client ( #109 )
2023-03-08 16:48:16 +01:00
OlivierDehaene
5fd2dcb513
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible ( #108 )
2023-03-08 13:53:41 +01:00
OlivierDehaene
0ac38d336a
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES ( #107 )
2023-03-08 11:06:59 +01:00
OlivierDehaene
b1485e18c5
fix(server): fix galactica batch ( #106 )
...
closes #105
2023-03-07 20:05:21 +01:00
OlivierDehaene
3fef90d50f
feat(clients): Python client ( #103 )
2023-03-07 18:52:22 +01:00