hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
drbh	06c3d4b1ec	feat: accept list as prompt and use first string (#1702 ) This PR allows the `CompletionRequest.prompt` to be sent as a string or array of strings. When an array is sent the first value will be used if it's a string; otherwise the according error will be thrown Fixes: https://github.com/huggingface/text-generation-inference/issues/1690 Similar to: https://github.com/vllm-project/vllm/pull/323/files	2024-04-17 10:41:12 +02:00
drbh	e4d31a40db	fix: bump clients test base url to llama (#1751 ) This PR bumps the client tests from `google/flan-t5-xxl` to `meta-llama/Llama-2-7b-chat-hf` to resolve issues when calling the endpoint and `google/flan-t5-xxl` is not available run with ```bash make python-client-tests clients/python/tests/test_client.py .............. [ 43%] clients/python/tests/test_errors.py .......... [ 75%] clients/python/tests/test_inference_api.py ...... [ 93%] clients/python/tests/test_types.py .. [100%] ``` **note `google/flan-t5-xxl` function is currently unused but still included in the `conftest.py`	2024-04-16 16:56:47 -04:00
OlivierDehaene	08e9181418	feat: update client to 0.7 (#1667 ) Close #1652	2024-03-22 17:10:56 +01:00
drbh	de6cb15fa5	fix: improve tool type, bump pydantic and outlines (#1650 ) This PR resolves a couple - [X] adjusts the tool response to align with openai's tools response type - [X] bumps pydantic to `2.6.4` in all apps (resolves dependency issue when running tests) - [X] bump `outlines` version and fix import for new name	2024-03-21 12:45:56 -04:00
drbh	3dd7da2198	feat: accept legacy request format and response (#1527 ) This WIP PR (will) add support for legacy OpenAI `v1/completions` API. This should allow TGI to be a drop in replacement for OpenAI when using tools that rely on the completions api Should fix: https://github.com/huggingface/text-generation-inference/issues/1468	2024-02-29 10:44:20 -05:00
Hugo Abonizio	9ed4d2c780	Fix async client timeout (#1617 ) # What does this PR do? Fixes #1616 According to the [aiohttp.ClientTimeout docs](https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.ClientTimeout), the arguments should be in seconds. This PR removes the multiplication by 60. @OlivierDehaene OR @Narsil	2024-02-29 15:41:49 +01:00
drbh	9b6db5f793	Support tools (#1587 ) This work in progress PR begins to add support for tools. Tools relies on grammar support and still has some unsolved challenges. Opening the PR for visibility and feedback	2024-02-28 11:10:27 +01:00
OlivierDehaene	9946165ee0	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
drbh	cef0553d59	Outlines guided generation (#1539 ) This WIP PR starts to add grammar support via outlines, currently this PR supports very simple regex grammars and does not optimize for precompiling or caching grammar fsm's. todo: - [X] add simple outlines guidance to `NextTokenChooser` - [X] update protos for grammar - [X] update generation params API - [X] constrain simple grammar - [ ] support parsing more complex grammar into fsm - [ ] support all outline support grammar types - [ ] explore optimizations to avoid recompiling grammars guided request ```bash curl -s 'http://localhost:3000/generate' \ --header 'Content-Type: application/json' \ --data-raw '{ "inputs": "make an email for david: \n", "parameters": { "max_new_tokens": 6, "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+" } }' \| jq ``` response ```json { "generated_text": "david@example.com" } ``` unguided request ```bash curl -s 'http://localhost:3000/generate' \ --header 'Content-Type: application/json' \ --data '{ "inputs": "make an email for david: \n", "parameters": { "max_new_tokens": 6 } }' \| jq ``` response ```json { "generated_text": " email = 'david" } ```	2024-02-15 10:28:10 +01:00
Nicolas Patry	1e03b61b5c	Revert "Modify default for max_new_tokens in python client (#1336 )" This reverts commit `2d56f106a6`. It causes a breaking in our integrations-tests.	2024-02-01 14:36:10 +00:00
freitng	2d56f106a6	Modify default for max_new_tokens in python client (#1336 ) # What does this PR do? Since ([#1097](https://github.com/huggingface/text-generation-inference/pull/1097)) the clients do not need to specify a max_length anymore. However, the python client in this repo had not yet been adapted to these changes. This PR makes it possible to use the python client and not provide max_new_tokens. <!-- Remove if not applicable --> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [x] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.	2024-01-29 11:02:57 -05:00
OlivierDehaene	3b56d7669b	feat: add mistral model (#1071 )	2023-09-28 09:55:47 +02:00
OlivierDehaene	47954b81e9	feat: format code (#1070 )	2023-09-27 12:22:09 +02:00
王佳欣	059bb5cf83	chore: sync text-generation version from 0.3.0 to 0.6.0 with pyproject.toml (#950 ) # What does this PR do? sync the version for text-generation.	2023-09-06 15:20:32 +02:00
Jelle Zijlstra	c8bbbd8129	chore(client): Support Pydantic 2 (#900 ) This should allow users to use either Pydantic 2 or Pydantic 1. I couldn't run all tests locally because I reran them too often and got rate limited, but I believe this is sufficient.	2023-09-06 14:12:08 +02:00
Nicolas Patry	211b54ac41	Rebased #617 (#868 ) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> --------- Co-authored-by: Vincent Brouwers <vincent.brouwers@ing.com>	2023-08-28 11:43:47 +02:00
Nicolas Patry	b9e33c4953	Upgrading versions of python client. (#862 ) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->	2023-08-17 09:15:35 +02:00
OlivierDehaene	895c5f1562	feat(server): only compute prefill logprobs when asked (#406 ) Close #288	2023-06-02 17:12:30 +02:00
OlivierDehaene	dbdc587ddd	feat(integration-tests): improve comparison and health checks (#336 )	2023-05-16 20:22:11 +02:00
Ehsan M. Kermani	f092ba9b22	feat(server): add watermarking tests (#248 )	2023-04-27 19:16:35 +02:00
OlivierDehaene	323546df1d	fix(python-client): add auth headers to is supported requests (#234 )	2023-04-25 13:55:26 +02:00
OlivierDehaene	b927244eb5	feat(python-client): get list of currently deployed tgi models using the inference API (#191 )	2023-04-17 18:43:24 +02:00
OlivierDehaene	53ee09c0b0	fea(dockerfile): better layer caching (#159 )	2023-04-14 10:12:21 +02:00
OlivierDehaene	d6a93fe992	fix(server): fix flash-neox scores warping (#137 )	2023-03-24 18:21:41 +01:00
OlivierDehaene	5d04525cb9	feat(python-client): release v0.4.0 (#135 )	2023-03-23 18:07:20 +01:00
dconathan	7850119055	feat(python-client): add cookies to Client constructors and requests (#132 ) I have a use case where we need to pass cookies (for auth reasons) to an internally hosted server. Note: I couldn't get the client tests to pass - do you need to have an HF token? ```python FAILED tests/test_client.py::test_generate - text_generation.errors.BadRequestError: Authorization header is correct, but the token seems invalid ```	2023-03-23 18:01:01 +01:00
OlivierDehaene	a3b7db932f	fix(python-client): relax dependencies (#129 )	2023-03-16 12:57:07 +01:00
OlivierDehaene	d8dc8f1b0c	feat(python-client): add new parameters (#118 )	2023-03-09 16:05:33 +01:00
OlivierDehaene	2c5df5d2af	fix(python-client): stream not set on the sync client (#109 )	2023-03-08 16:48:16 +01:00
OlivierDehaene	0ac38d336a	feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107 )	2023-03-08 11:06:59 +01:00
OlivierDehaene	3fef90d50f	feat(clients): Python client (#103 )	2023-03-07 18:52:22 +01:00

31 Commits