preemo_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
Michael Feil	972e9a7f7c	update causal batch for ct2 and fix nf4 (#17 ) * update causal batch for ct2 and fix nf4 * bump the ctranslate2 version --------- Co-authored-by: Michael Feil <michael.feil@michaelfeil.eu>	2024-02-09 11:07:14 -08:00
Michael Feil	339ede9e90	Update Readme.md / documentation (#15 ) * add documentation updates * update readme * Update README.md	2023-10-03 23:01:06 -07:00
Michael Feil	da9746586b	Update README.md	2023-08-03 23:23:02 +02:00
Yang, Bo	8af4a7a0b0	Merge branch 'main' into bnb_4bit	2023-08-02 12:47:17 -07:00
Yang, Bo	9048a80f8f	Add a new README (#3 ) * Rename README.md to README-HuggingFace.md * Add Preemo's README	2023-08-01 12:22:07 -07:00
michaelfeil	44fa36b5bf	restoring commit from dev branch, rebase on current master	2023-08-01 18:15:18 +02:00
regisss	f848decee6	docs: Add hardware section to TOC in README (#721 )	2023-07-28 11:20:03 +02:00
regisss	5a1cccbb98	Add section about TGI on other AI hardware accelerators in README (#715 ) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> As per title. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->	2023-07-28 09:14:03 +02:00
OlivierDehaene	9f18f4c006	v0.9.4 (#713 )	2023-07-27 19:25:15 +02:00
OlivierDehaene	e64a65891b	docs(README): update readme	2023-07-25 19:45:25 +02:00
Nicolas Patry	5a1512c025	docs: Update README.md (#643 )	2023-07-19 13:39:12 +02:00
Nicolas Patry	1c81df15cd	docs: Update README.md (#639 )	2023-07-19 13:38:52 +02:00
OlivierDehaene	cf83f9b66f	v0.9.3 (#634 )	2023-07-18 18:11:20 +02:00
Victor Muštar	c8b077be79	docs: README: Add logo + baseline (#611 ) ![image](https://github.com/huggingface/text-generation-inference/assets/3841370/58177321-479f-4ad1-b3bc-cec027423984)	2023-07-13 21:45:20 +02:00
OlivierDehaene	e28a809004	v0.9.0 (#525 )	2023-07-01 19:25:41 +02:00
OlivierDehaene	e74bd41e0f	feat(server): add paged attention to flash models (#516 ) Closes #478	2023-06-30 19:09:59 +02:00
OlivierDehaene	081b926584	v0.8.0	2023-05-30 18:39:35 +02:00
OlivierDehaene	d31562f300	v0.7.0 (#353 )	2023-05-23 21:20:49 +02:00
OlivierDehaene	e71471bec9	feat: add snapshot testing (#282 )	2023-05-15 23:36:30 +02:00
Nicolas Patry	e86cca9723	Adding docs on how dynamic batching works. (#258 ) This PR starts the minimal possible amount of explanation I could think of. It tries to explain how dynamic batching occurs, the interactions with past key values and ignores the padding problem. Maybe some drawings could help too but I kept it to text for now.	2023-05-01 14:16:50 +02:00
Nicolas Patry	b0b97fd9a7	doc(launcher): add more docs to the `launcher` itself and link in the README (#257 )	2023-04-29 11:53:42 +02:00
Ehsan M. Kermani	f092ba9b22	feat(server): add watermarking tests (#248 )	2023-04-27 19:16:35 +02:00
OlivierDehaene	b927244eb5	feat(python-client): get list of currently deployed tgi models using the inference API (#191 )	2023-04-17 18:43:24 +02:00
OlivierDehaene	f26dfd0dc1	feat(server): support OPT models (#55 ) OPT models do not all have a `tokenizer.json` file on the hub at the moment. Can't merge for now.	2023-04-11 19:16:41 +02:00
OlivierDehaene	299217c95c	feat(server): add flash attention llama (#144 )	2023-04-11 16:38:22 +02:00
Guspan Tanadi	9122e7bd9c	docs(readme): provide link Logits Warper README (#154 )	2023-04-04 13:27:46 +02:00
lewtun	5e5e9d4bbd	feat: Add note about NVIDIA drivers (#64 ) Co-authored-by: OlivierDehaene <olivier@huggingface.co>	2023-03-23 18:03:45 +01:00
OlivierDehaene	3fef90d50f	feat(clients): Python client (#103 )	2023-03-07 18:52:22 +01:00
OlivierDehaene	1c19b0934e	v0.3.2 (#97 )	2023-03-03 18:42:20 +01:00
OlivierDehaene	0fbc691946	feat: add safetensors conversion (#63 )	2023-02-14 13:02:16 +01:00
OlivierDehaene	9af454142a	feat: add distributed tracing (#62 )	2023-02-13 13:02:45 +01:00
Yannic Kilcher	e520d5b349	fixed SSE naming (#61 ) https://en.wikipedia.org/wiki/Server-sent_events	2023-02-08 22:30:11 +01:00
OlivierDehaene	1ad3250b89	fix(docker): increase shm size (#60 )	2023-02-08 17:53:33 +01:00
OlivierDehaene	c503a639b1	feat(server): support t5 (#59 )	2023-02-07 18:25:17 +01:00
lewtun	a0dca443dd	feat(docs): Clarify installation steps (#54 ) Adds some bits for first-time users (like me 😄 )	2023-02-03 13:07:55 +01:00
OlivierDehaene	20c3c5940c	feat(router): refactor API and add openAPI schemas (#53 )	2023-02-03 12:43:37 +01:00
OlivierDehaene	313194f6d7	feat(server): support repetition penalty (#47 )	2023-02-01 15:58:42 +01:00
OlivierDehaene	2ad895a6cc	feat(server): allow gpt-neox models with odd vocab sizes to be sharded (#48 )	2023-02-01 14:43:59 +01:00
OlivierDehaene	f830706b21	feat(server): Support GPT-Neox (#39 )	2023-01-31 18:53:56 +01:00
OlivierDehaene	15511edc01	feat(server): Support SantaCoder (#26 )	2023-01-20 12:24:39 +01:00
OlivierDehaene	32a253063d	feat: Return logprobs (#8 )	2022-12-15 17:03:56 +01:00
OlivierDehaene	718096f695	feat: Support stop sequences (#7 )	2022-12-12 18:25:22 +01:00
OlivierDehaene	a2985036aa	feat(server): Add model tests (#6 )	2022-12-08 18:49:33 +01:00
OlivierDehaene	daa1d81d5e	feat(server): Support Galactica (#4 )	2022-12-01 19:31:54 +01:00
OlivierDehaene	feb7806ca4	fix(readme): Typo	2022-11-14 16:22:10 +01:00
OlivierDehaene	4236e41b0d	feat(server): Improved doc	2022-11-07 12:53:56 +01:00
OlivierDehaene	427d7cc444	feat(server): Support AutoModelForSeq2SeqLM	2022-11-04 18:03:04 +01:00
OlivierDehaene	c5665f5c8b	feat(server): Support generic AutoModelForCausalLM	2022-11-04 14:22:47 +01:00
OlivierDehaene	755fc0e403	fix(models): Revert buggy support for AutoModel	2022-11-03 16:07:54 +01:00
OlivierDehaene	b3b7ea0d74	feat: Use json formatter by default in docker image	2022-11-02 17:29:56 +01:00

1 2

59 Commits