Nicolas Patry
1c81df15cd
docs: Update README.md ( #639 )
2023-07-19 13:38:52 +02:00
OlivierDehaene
cf83f9b66f
v0.9.3 ( #634 )
2023-07-18 18:11:20 +02:00
Victor Muštar
c8b077be79
docs: README: Add logo + baseline ( #611 )
...
![image](https://github.com/huggingface/text-generation-inference/assets/3841370/58177321-479f-4ad1-b3bc-cec027423984 )
2023-07-13 21:45:20 +02:00
OlivierDehaene
e28a809004
v0.9.0 ( #525 )
2023-07-01 19:25:41 +02:00
OlivierDehaene
e74bd41e0f
feat(server): add paged attention to flash models ( #516 )
...
Closes #478
2023-06-30 19:09:59 +02:00
OlivierDehaene
081b926584
v0.8.0
2023-05-30 18:39:35 +02:00
OlivierDehaene
d31562f300
v0.7.0 ( #353 )
2023-05-23 21:20:49 +02:00
OlivierDehaene
e71471bec9
feat: add snapshot testing ( #282 )
2023-05-15 23:36:30 +02:00
Nicolas Patry
e86cca9723
Adding docs on how dynamic batching works. ( #258 )
...
This PR starts the minimal possible amount of explanation I could think
of. It tries to explain how dynamic batching occurs, the interactions
with past key values and ignores the padding problem.
Maybe some drawings could help too but I kept it to text for now.
2023-05-01 14:16:50 +02:00
Nicolas Patry
b0b97fd9a7
doc(launcher): add more docs to the `launcher` itself and link in the README ( #257 )
2023-04-29 11:53:42 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests ( #248 )
2023-04-27 19:16:35 +02:00
OlivierDehaene
b927244eb5
feat(python-client): get list of currently deployed tgi models using the inference API ( #191 )
2023-04-17 18:43:24 +02:00
OlivierDehaene
f26dfd0dc1
feat(server): support OPT models ( #55 )
...
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama ( #144 )
2023-04-11 16:38:22 +02:00
Guspan Tanadi
9122e7bd9c
docs(readme): provide link Logits Warper README ( #154 )
2023-04-04 13:27:46 +02:00
lewtun
5e5e9d4bbd
feat: Add note about NVIDIA drivers ( #64 )
...
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-03-23 18:03:45 +01:00
OlivierDehaene
3fef90d50f
feat(clients): Python client ( #103 )
2023-03-07 18:52:22 +01:00
OlivierDehaene
1c19b0934e
v0.3.2 ( #97 )
2023-03-03 18:42:20 +01:00
OlivierDehaene
0fbc691946
feat: add safetensors conversion ( #63 )
2023-02-14 13:02:16 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
Yannic Kilcher
e520d5b349
fixed SSE naming ( #61 )
...
https://en.wikipedia.org/wiki/Server-sent_events
2023-02-08 22:30:11 +01:00
OlivierDehaene
1ad3250b89
fix(docker): increase shm size ( #60 )
2023-02-08 17:53:33 +01:00
OlivierDehaene
c503a639b1
feat(server): support t5 ( #59 )
2023-02-07 18:25:17 +01:00
lewtun
a0dca443dd
feat(docs): Clarify installation steps ( #54 )
...
Adds some bits for first-time users (like me 😄 )
2023-02-03 13:07:55 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas ( #53 )
2023-02-03 12:43:37 +01:00
OlivierDehaene
313194f6d7
feat(server): support repetition penalty ( #47 )
2023-02-01 15:58:42 +01:00
OlivierDehaene
2ad895a6cc
feat(server): allow gpt-neox models with odd vocab sizes to be sharded ( #48 )
2023-02-01 14:43:59 +01:00
OlivierDehaene
f830706b21
feat(server): Support GPT-Neox ( #39 )
2023-01-31 18:53:56 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder ( #26 )
2023-01-20 12:24:39 +01:00
OlivierDehaene
32a253063d
feat: Return logprobs ( #8 )
2022-12-15 17:03:56 +01:00
OlivierDehaene
718096f695
feat: Support stop sequences ( #7 )
2022-12-12 18:25:22 +01:00
OlivierDehaene
a2985036aa
feat(server): Add model tests ( #6 )
2022-12-08 18:49:33 +01:00
OlivierDehaene
daa1d81d5e
feat(server): Support Galactica ( #4 )
2022-12-01 19:31:54 +01:00
OlivierDehaene
feb7806ca4
fix(readme): Typo
2022-11-14 16:22:10 +01:00
OlivierDehaene
4236e41b0d
feat(server): Improved doc
2022-11-07 12:53:56 +01:00
OlivierDehaene
427d7cc444
feat(server): Support AutoModelForSeq2SeqLM
2022-11-04 18:03:04 +01:00
OlivierDehaene
c5665f5c8b
feat(server): Support generic AutoModelForCausalLM
2022-11-04 14:22:47 +01:00
OlivierDehaene
755fc0e403
fix(models): Revert buggy support for AutoModel
2022-11-03 16:07:54 +01:00
OlivierDehaene
b3b7ea0d74
feat: Use json formatter by default in docker image
2022-11-02 17:29:56 +01:00
OlivierDehaene
3cf6368c77
feat(server): Support all AutoModelForCausalLM on a best effort basis
2022-10-28 19:24:00 +02:00
OlivierDehaene
09674e6df9
feat(server): Support bitsandbytes
2022-10-27 14:25:29 +02:00
Nicolas Patry
c8ce9b2515
feat(server): Use safetensors
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2022-10-22 20:00:15 +02:00
Olivier Dehaene
f16f2f5ae1
v0.1.0
2022-10-20 19:14:44 +02:00
Olivier Dehaene
92c1ecd008
feat: Add arguments to CLI
2022-10-17 18:27:33 +02:00
Olivier Dehaene
5e5d8766a2
feat: Improve error handling
2022-10-17 14:59:00 +02:00
Olivier Dehaene
bf99afe916
feat: Docker image
2022-10-14 15:56:21 +02:00
Olivier Dehaene
4c693e6524
Refactored gRPC interface
...
Added validation logic
2022-10-11 16:50:54 +02:00
Olivier Dehaene
295831a481
Init
2022-10-08 12:30:12 +02:00