hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
OlivierDehaene	1539d3cbbe	feat(router): Remove second lock from batcher hot path (#27 ) @njhill	2023-01-26 16:29:13 +01:00
OlivierDehaene	5c01e2544c	fix(router): fix api-inference deployment (#31 )	2023-01-23 17:42:14 +01:00
OlivierDehaene	f9d0ec376a	feat(docker): Make the image compatible with api-inference (#29 )	2023-01-23 17:11:27 +01:00
OlivierDehaene	15511edc01	feat(server): Support SantaCoder (#26 )	2023-01-20 12:24:39 +01:00
Nick Hill	f7ac394935	fix(router): Obey max batch size (#23 )	2023-01-17 09:11:21 +01:00
Nick Hill	e6d3eb5d5d	fix(server): Minor refactorization using new_zeros (#24 ) - Fix some type hints, in particular base tokenizer class - Make use of `tensor.new_zero/empty` methods - Simplify env var string parsing in launcher	2023-01-17 09:10:22 +01:00
Nick Hill	60472f9d2b	feat(router): Add const parameters to validation logic (#15 ) I noticed some opportunity to collapse some of the logic, in case you are interested.	2023-01-03 10:41:22 +01:00
Nick Hill	3efa5bbbfd	fix(router): Include special tokens when tokenizing (#14 ) There's currently a discrepancy in the tokenization between the router and python server code. The latter includes special tokens but former does not. This results in a token count mismatch for seq2seq models such as mt0 where the tokenizer emits an EOS token at the end. This in turn results in some unexpected/incorrect output, in particular when batch concatenation is involved, because the python code uses the input length passed from the router for each row. As far as I can tell, it is better to include this token in the encoder `input_ids`, so I guess it's best to just adjust on the router side.	2022-12-30 19:31:44 +01:00
OlivierDehaene	32a253063d	feat: Return logprobs (#8 )	2022-12-15 17:03:56 +01:00
OlivierDehaene	718096f695	feat: Support stop sequences (#7 )	2022-12-12 18:25:22 +01:00
OlivierDehaene	a2985036aa	feat(server): Add model tests (#6 )	2022-12-08 18:49:33 +01:00
Nick Hill	31d76e238d	fix(batching): Avoid theoretical hang in batcher loop (#5 ) - Avoid theoretical hang in batcher loop - Avoid a couple of clones in the router generate method - Keep attention mask tensors as integers - Remove num_heads attribute Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>	2022-12-05 10:10:59 +01:00
OlivierDehaene	d6d5b12e03	fix(router): Handle tokenizer errors	2022-11-14 17:15:19 +01:00
OlivierDehaene	91f5f86280	fix(router): Fix HTTP status codes	2022-11-14 14:34:15 +01:00
OlivierDehaene	427d7cc444	feat(server): Support AutoModelForSeq2SeqLM	2022-11-04 18:03:04 +01:00
OlivierDehaene	c5665f5c8b	feat(server): Support generic AutoModelForCausalLM	2022-11-04 14:22:47 +01:00
OlivierDehaene	b3b7ea0d74	feat: Use json formatter by default in docker image	2022-11-02 17:29:56 +01:00
OlivierDehaene	3cf6368c77	feat(server): Support all AutoModelForCausalLM on a best effort basis	2022-10-28 19:24:00 +02:00
OlivierDehaene	09674e6df9	feat(server): Support bitsandbytes	2022-10-27 14:25:29 +02:00
OlivierDehaene	beb552127a	feat(client): Simplify sharded logic	2022-10-22 23:40:05 +02:00
OlivierDehaene	c837893370	feat(router): Add max_waiting_tokens	2022-10-21 16:40:05 +02:00
OlivierDehaene	895a341d06	fix(validation): Fix error messages	2022-10-21 10:59:15 +02:00
Olivier Dehaene	f16f2f5ae1	v0.1.0	2022-10-20 19:14:44 +02:00
Olivier Dehaene	92c1ecd008	feat: Add arguments to CLI	2022-10-17 18:27:33 +02:00
Olivier Dehaene	5e5d8766a2	feat: Improve error handling	2022-10-17 14:59:00 +02:00
Olivier Dehaene	bcb53903b8	feat: Add AML deployment	2022-10-15 20:21:50 +02:00
Olivier Dehaene	bf99afe916	feat: Docker image	2022-10-14 15:56:21 +02:00
Olivier Dehaene	39df4d9975	Use axum	2022-10-11 18:14:39 +02:00
Olivier Dehaene	e86ecbac63	ValidationError was not correctly handled	2022-10-11 16:53:40 +02:00
Olivier Dehaene	4c693e6524	Refactored gRPC interface Added validation logic	2022-10-11 16:50:54 +02:00
Olivier Dehaene	fa9a088467	Add load testing	2022-10-11 10:36:51 +02:00
Olivier Dehaene	295831a481	Init	2022-10-08 12:30:12 +02:00

1 2 3 4 5

232 Commits