Nick Hill
f7ac394935
fix(router): Obey max batch size ( #23 )
2023-01-17 09:11:21 +01:00
Nick Hill
e6d3eb5d5d
fix(server): Minor refactorization using new_zeros ( #24 )
...
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene
fcc2c5fcbf
feat(launcher): Log server stdout ( #19 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-01-05 12:01:23 +01:00
Nicolas Patry
b94f30215f
fix(server): Use cleanup_tokenization_spaces=False for lossless decoding ( #13 )
...
Fixes #12 in the easiest way I could think of.
2023-01-03 11:07:05 +01:00
Nick Hill
60472f9d2b
feat(router): Add const parameters to validation logic ( #15 )
...
I noticed some opportunity to collapse some of the logic, in case you
are interested.
2023-01-03 10:41:22 +01:00
Nick Hill
3efa5bbbfd
fix(router): Include special tokens when tokenizing ( #14 )
...
There's currently a discrepancy in the tokenization between the router
and python server code. The latter includes special tokens but former
does not.
This results in a token count mismatch for seq2seq models such as mt0
where the tokenizer emits an EOS token at the end.
This in turn results in some unexpected/incorrect output, in particular
when batch concatenation is involved, because the python code uses the
input length passed from the router for each row.
As far as I can tell, it is better to include this token in the encoder
`input_ids`, so I guess it's best to just adjust on the router side.
2022-12-30 19:31:44 +01:00
Nick Hill
686cc66717
fix(server): Check for device type correctly when determining initial padding ( #16 )
...
AFAIK there is no torch device type called "gpu".
2022-12-30 19:30:42 +01:00
OlivierDehaene
611e21cb13
fix(server): Fix stop sequences ( #11 )
2022-12-16 16:03:39 +01:00
OlivierDehaene
3e2e6240b8
feat(launcher): Add integration tests ( #9 )
2022-12-16 11:29:36 +01:00
OlivierDehaene
32a253063d
feat: Return logprobs ( #8 )
2022-12-15 17:03:56 +01:00
OlivierDehaene
718096f695
feat: Support stop sequences ( #7 )
2022-12-12 18:25:22 +01:00
OlivierDehaene
042180d88f
fix(server): Only pad to multiple of 8 on GPUs
2022-12-08 19:37:37 +01:00
OlivierDehaene
a2985036aa
feat(server): Add model tests ( #6 )
2022-12-08 18:49:33 +01:00
Nick Hill
31d76e238d
fix(batching): Avoid theoretical hang in batcher loop ( #5 )
...
- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute
Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>
2022-12-05 10:10:59 +01:00
OlivierDehaene
daa1d81d5e
feat(server): Support Galactica ( #4 )
2022-12-01 19:31:54 +01:00
OlivierDehaene
d6d5b12e03
fix(router): Handle tokenizer errors
2022-11-14 17:15:19 +01:00
OlivierDehaene
feb7806ca4
fix(readme): Typo
2022-11-14 16:22:10 +01:00
OlivierDehaene
91f5f86280
fix(router): Fix HTTP status codes
2022-11-14 14:34:15 +01:00
OlivierDehaene
6c781025ae
feat(rust): Update to 1.65
2022-11-14 13:59:56 +01:00
OlivierDehaene
dccd5c2b1a
feat(server): Clarify CausalLMBatch concatenate method
2022-11-09 18:24:07 +01:00
OlivierDehaene
fa43fb71be
fix(server): Fix Transformers fork version
2022-11-08 17:42:38 +01:00
OlivierDehaene
4236e41b0d
feat(server): Improved doc
2022-11-07 12:53:56 +01:00
OlivierDehaene
cea6051eff
feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard
2022-11-04 18:31:08 +01:00
OlivierDehaene
427d7cc444
feat(server): Support AutoModelForSeq2SeqLM
2022-11-04 18:03:04 +01:00
OlivierDehaene
c5665f5c8b
feat(server): Support generic AutoModelForCausalLM
2022-11-04 14:22:47 +01:00
OlivierDehaene
755fc0e403
fix(models): Revert buggy support for AutoModel
2022-11-03 16:07:54 +01:00
OlivierDehaene
b3b7ea0d74
feat: Use json formatter by default in docker image
2022-11-02 17:29:56 +01:00
OlivierDehaene
3cf6368c77
feat(server): Support all AutoModelForCausalLM on a best effort basis
2022-10-28 19:24:00 +02:00
OlivierDehaene
09674e6df9
feat(server): Support bitsandbytes
2022-10-27 14:25:29 +02:00
OlivierDehaene
beb552127a
feat(client): Simplify sharded logic
2022-10-22 23:40:05 +02:00
Nicolas Patry
c8ce9b2515
feat(server): Use safetensors
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2022-10-22 20:00:15 +02:00
Thomas Wang
be8827fe41
Create LICENSE ( #2 )
2022-10-22 10:44:52 +02:00
OlivierDehaene
c837893370
feat(router): Add max_waiting_tokens
2022-10-21 16:40:05 +02:00
OlivierDehaene
895a341d06
fix(validation): Fix error messages
2022-10-21 10:59:15 +02:00
Olivier Dehaene
f16f2f5ae1
v0.1.0
2022-10-20 19:14:44 +02:00
Olivier Dehaene
92c1ecd008
feat: Add arguments to CLI
2022-10-17 18:27:33 +02:00
Olivier Dehaene
5e5d8766a2
feat: Improve error handling
2022-10-17 14:59:00 +02:00
Olivier Dehaene
00e6ce44b1
Update aml deployment
2022-10-17 10:39:59 +02:00
Olivier Dehaene
bcb53903b8
feat: Add AML deployment
2022-10-15 20:21:50 +02:00
Olivier Dehaene
bf99afe916
feat: Docker image
2022-10-14 15:56:21 +02:00
Olivier Dehaene
39df4d9975
Use axum
2022-10-11 18:14:39 +02:00
Olivier Dehaene
e86ecbac63
ValidationError was not correctly handled
2022-10-11 16:53:40 +02:00
Olivier Dehaene
4c693e6524
Refactored gRPC interface
...
Added validation logic
2022-10-11 16:50:54 +02:00
Olivier Dehaene
fa9a088467
Add load testing
2022-10-11 10:36:51 +02:00
Olivier Dehaene
1d986983d5
fix: cleanup
2022-10-08 12:34:25 +02:00
Olivier Dehaene
295831a481
Init
2022-10-08 12:30:12 +02:00