hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
OlivierDehaene	15511edc01	feat(server): Support SantaCoder (#26 )	2023-01-20 12:24:39 +01:00
Nick Hill	f7ac394935	fix(router): Obey max batch size (#23 )	2023-01-17 09:11:21 +01:00
Nick Hill	e6d3eb5d5d	fix(server): Minor refactorization using new_zeros (#24 ) - Fix some type hints, in particular base tokenizer class - Make use of `tensor.new_zero/empty` methods - Simplify env var string parsing in launcher	2023-01-17 09:10:22 +01:00
OlivierDehaene	fcc2c5fcbf	feat(launcher): Log server stdout (#19 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2023-01-05 12:01:23 +01:00
Nicolas Patry	b94f30215f	fix(server): Use cleanup_tokenization_spaces=False for lossless decoding (#13 ) Fixes #12 in the easiest way I could think of.	2023-01-03 11:07:05 +01:00
Nick Hill	60472f9d2b	feat(router): Add const parameters to validation logic (#15 ) I noticed some opportunity to collapse some of the logic, in case you are interested.	2023-01-03 10:41:22 +01:00
Nick Hill	3efa5bbbfd	fix(router): Include special tokens when tokenizing (#14 ) There's currently a discrepancy in the tokenization between the router and python server code. The latter includes special tokens but former does not. This results in a token count mismatch for seq2seq models such as mt0 where the tokenizer emits an EOS token at the end. This in turn results in some unexpected/incorrect output, in particular when batch concatenation is involved, because the python code uses the input length passed from the router for each row. As far as I can tell, it is better to include this token in the encoder `input_ids`, so I guess it's best to just adjust on the router side.	2022-12-30 19:31:44 +01:00
Nick Hill	686cc66717	fix(server): Check for device type correctly when determining initial padding (#16 ) AFAIK there is no torch device type called "gpu".	2022-12-30 19:30:42 +01:00
OlivierDehaene	611e21cb13	fix(server): Fix stop sequences (#11 )	2022-12-16 16:03:39 +01:00
OlivierDehaene	3e2e6240b8	feat(launcher): Add integration tests (#9 )	2022-12-16 11:29:36 +01:00
OlivierDehaene	32a253063d	feat: Return logprobs (#8 )	2022-12-15 17:03:56 +01:00
OlivierDehaene	718096f695	feat: Support stop sequences (#7 )	2022-12-12 18:25:22 +01:00
OlivierDehaene	042180d88f	fix(server): Only pad to multiple of 8 on GPUs	2022-12-08 19:37:37 +01:00
OlivierDehaene	a2985036aa	feat(server): Add model tests (#6 )	2022-12-08 18:49:33 +01:00
Nick Hill	31d76e238d	fix(batching): Avoid theoretical hang in batcher loop (#5 ) - Avoid theoretical hang in batcher loop - Avoid a couple of clones in the router generate method - Keep attention mask tensors as integers - Remove num_heads attribute Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>	2022-12-05 10:10:59 +01:00
OlivierDehaene	daa1d81d5e	feat(server): Support Galactica (#4 )	2022-12-01 19:31:54 +01:00
OlivierDehaene	d6d5b12e03	fix(router): Handle tokenizer errors	2022-11-14 17:15:19 +01:00
OlivierDehaene	feb7806ca4	fix(readme): Typo	2022-11-14 16:22:10 +01:00
OlivierDehaene	91f5f86280	fix(router): Fix HTTP status codes	2022-11-14 14:34:15 +01:00
OlivierDehaene	6c781025ae	feat(rust): Update to 1.65	2022-11-14 13:59:56 +01:00
OlivierDehaene	dccd5c2b1a	feat(server): Clarify CausalLMBatch concatenate method	2022-11-09 18:24:07 +01:00
OlivierDehaene	fa43fb71be	fix(server): Fix Transformers fork version	2022-11-08 17:42:38 +01:00
OlivierDehaene	4236e41b0d	feat(server): Improved doc	2022-11-07 12:53:56 +01:00
OlivierDehaene	cea6051eff	feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard	2022-11-04 18:31:08 +01:00
OlivierDehaene	427d7cc444	feat(server): Support AutoModelForSeq2SeqLM	2022-11-04 18:03:04 +01:00
OlivierDehaene	c5665f5c8b	feat(server): Support generic AutoModelForCausalLM	2022-11-04 14:22:47 +01:00
OlivierDehaene	755fc0e403	fix(models): Revert buggy support for AutoModel	2022-11-03 16:07:54 +01:00
OlivierDehaene	b3b7ea0d74	feat: Use json formatter by default in docker image	2022-11-02 17:29:56 +01:00
OlivierDehaene	3cf6368c77	feat(server): Support all AutoModelForCausalLM on a best effort basis	2022-10-28 19:24:00 +02:00
OlivierDehaene	09674e6df9	feat(server): Support bitsandbytes	2022-10-27 14:25:29 +02:00
OlivierDehaene	beb552127a	feat(client): Simplify sharded logic	2022-10-22 23:40:05 +02:00
Nicolas Patry	c8ce9b2515	feat(server): Use safetensors Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>	2022-10-22 20:00:15 +02:00
Thomas Wang	be8827fe41	Create LICENSE (#2 )	2022-10-22 10:44:52 +02:00
OlivierDehaene	c837893370	feat(router): Add max_waiting_tokens	2022-10-21 16:40:05 +02:00
OlivierDehaene	895a341d06	fix(validation): Fix error messages	2022-10-21 10:59:15 +02:00
Olivier Dehaene	f16f2f5ae1	v0.1.0	2022-10-20 19:14:44 +02:00
Olivier Dehaene	92c1ecd008	feat: Add arguments to CLI	2022-10-17 18:27:33 +02:00
Olivier Dehaene	5e5d8766a2	feat: Improve error handling	2022-10-17 14:59:00 +02:00
Olivier Dehaene	00e6ce44b1	Update aml deployment	2022-10-17 10:39:59 +02:00
Olivier Dehaene	bcb53903b8	feat: Add AML deployment	2022-10-15 20:21:50 +02:00
Olivier Dehaene	bf99afe916	feat: Docker image	2022-10-14 15:56:21 +02:00
Olivier Dehaene	39df4d9975	Use axum	2022-10-11 18:14:39 +02:00
Olivier Dehaene	e86ecbac63	ValidationError was not correctly handled	2022-10-11 16:53:40 +02:00
Olivier Dehaene	4c693e6524	Refactored gRPC interface Added validation logic	2022-10-11 16:50:54 +02:00
Olivier Dehaene	fa9a088467	Add load testing	2022-10-11 10:36:51 +02:00
Olivier Dehaene	1d986983d5	fix: cleanup	2022-10-08 12:34:25 +02:00
Olivier Dehaene	295831a481	Init	2022-10-08 12:30:12 +02:00

47 Commits All Branches Search

47 Commits

All Branches