hf_text-generation-inference

History

Jae-Won Chung b9633c46d0 Fix typing in `Model.generate_token` (#733 ) ## What does this PR do? This PR fixes a minor type annotation issue in the signature of `Model.generate_token`. All existing overrides of `Model.generate_token` return `Tuple[List[Generation], Optional[B]]`: `3ef5ffbc64/server/text_generation_server/models/causal_lm.py (L535-L537)` `3ef5ffbc64/server/text_generation_server/models/flash_causal_lm.py (L802-L804)` `3ef5ffbc64/server/text_generation_server/models/seq2seq_lm.py (L589-L591)` I suspect that back in `017a2a8c` when `GeneratedText` and `Generation` were separated, the function signature was not updated. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? CC @OlivierDehaene		2023-07-31 14:35:14 +02:00
..
custom_modeling	feat(server): support new falcon config (#712 )	2023-07-27 18:38:57 +02:00
__init__.py	feat(server): support new falcon config (#712 )	2023-07-27 18:38:57 +02:00
bloom.py	feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )	2023-07-25 13:00:27 +02:00
causal_lm.py	feat: Add the option to force another dtype than `f16`. (#513 )	2023-06-30 20:30:09 +02:00
flash_causal_lm.py	feat: add cuda memory fraction (#659 )	2023-07-24 11:43:58 +02:00
flash_llama.py	feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )	2023-07-25 13:00:27 +02:00
flash_neox.py	feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )	2023-07-25 13:00:27 +02:00
flash_rw.py	fix(server): fix quantization python requirements (#708 )	2023-07-27 12:28:10 +02:00
flash_santacoder.py	feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )	2023-07-25 13:00:27 +02:00
galactica.py	feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )	2023-07-25 13:00:27 +02:00
gpt_neox.py	feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )	2023-07-25 13:00:27 +02:00
model.py	Fix typing in `Model.generate_token` (#733 )	2023-07-31 14:35:14 +02:00
mpt.py	feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )	2023-07-25 13:00:27 +02:00
opt.py	feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )	2023-07-25 13:00:27 +02:00
rw.py	feat: Add the option to force another dtype than `f16`. (#513 )	2023-06-30 20:30:09 +02:00
santacoder.py	Directly load GPTBigCode to specified device (#618 )	2023-07-21 11:27:31 +02:00
seq2seq_lm.py	feat: Add the option to force another dtype than `f16`. (#513 )	2023-06-30 20:30:09 +02:00
t5.py	fix(server): T5 weights names. (#582 )	2023-07-12 10:01:42 +02:00
types.py	feat(server): support vectorized warpers in flash causal lm (#317 )	2023-05-26 12:30:27 +02:00