hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
OlivierDehaene	fe9abad1a9	mirror docker	2024-06-18 15:58:59 +02:00
OlivierDehaene	e5c27364be	avoid join_all	2024-06-18 15:44:28 +02:00
OlivierDehaene	b21ed583ac	fix logic	2024-06-18 13:57:04 +02:00
OlivierDehaene	fe6a2756f1	Merge branch 'main' into feat/page_re_alloc	2024-06-18 13:13:49 +02:00
OlivierDehaene	7ed1044585	added padded blocks and logs everywhere	2024-06-18 12:18:05 +02:00
Daniël de Kok	11ea9ce002	CI: pass pre-commit hooks again (#2084 )	2024-06-18 09:38:21 +02:00
Guillaume LEGENDRE	4f25c67d63	CI: Tailscale improvements (#2079 ) * test local tailscale * Update build.yaml * Update build.yaml * Update build.yaml * Update build.yaml * wait for ssh * network host * change step order	2024-06-18 09:13:04 +02:00
Daniël de Kok	c8c7ccd31e	Set maximum grpc message receive size to 2GiB (#2075 ) * Set maximum grpc message receive size to 2GiB The previous default was 4MiB, which doesn't really work well for multi-modal models. * Update to Rust 1.79.0 * Fixup formatting to make PR pass	2024-06-17 16:40:44 +02:00
Ziru Niu	0f7d38e774	fix build.rs watch files (#2072 )	2024-06-17 12:10:01 +02:00
Lysandre Debut	131838919e	Contributing guide & Code of Conduct (#2074 ) * Contributing guide & Code of Conduct * Redirect to GitHub's tutorial on PRs	2024-06-17 12:09:31 +02:00
Daniël de Kok	e903770897	Support different image sizes in prefill in VLMs (#2065 ) When a batch contained images if different sizes during prefill, the server would fail (see e.g. #2056). Images were processed separately and then concatenated. However, this can fail for images with different sizes. Fix this by preprocessing all images in the batch together, so that the image processor can ensure that all image tensors have compatible sizes.	2024-06-17 10:49:41 +02:00
Alvaro Moran	445f313504	Adding architecture document (#2044 ) * doc: adding architecture document * doc: add architecture to toctree * fix: avoid cargo lock changes * fix: avoid cargo lock tweak --------- Co-authored-by: drbh <david.richard.holtz@gmail.com>	2024-06-14 09:28:34 -04:00
Tiezhen WANG	96b7b40ca3	Update the link for qwen2 (#2068 ) * Update the link for qwen2 * Fix Qwen2 model URL in model table * Fix too eager staging --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2024-06-14 11:59:33 +02:00
Daniël de Kok	093a27c528	Add support for GPTQ Marlin (#2052 ) Add support for GPTQ Marlin kernels GPTQ Marlin extends the Marlin kernels to support common GPTQ configurations: - bits: 4 or 8 - groupsize: -1, 32, 64, or 128 - desc_act: true/false Using the GPTQ Marlin kernels requires repacking the parameters in the Marlin quantizer format. The kernels were contributed by Neural Magic to VLLM. We vendor them here for convenience.	2024-06-14 09:45:42 +02:00
drbh	f433f1f770	implement Open Inference Protocol endpoints (#1942 ) * feat: add kserve feature and basic routes * feat: implement infer endpoint wrapper around generate * fix: refactor and improve types * fix: improve infer and simplify * fix: cleanup and improve api docs * fix: refactor and encapsulate kserve feat in file * fix: remove typos after rebase	2024-06-13 12:51:51 -04:00
drbh	42aa8ee1bb	PR #2049 CI run (#2054 ) * Use minijinja's pycompat mode for python methods * fix: cargo fmt lint for pre commit --------- Co-authored-by: Armin Ronacher <armin.ronacher@active-4.com>	2024-06-13 11:53:49 -04:00
OlivierDehaene	abe521204e	fix tests	2024-06-12 18:54:25 +02:00
OlivierDehaene	05eb4dcb17	allocate 16 by 16	2024-06-12 18:53:14 +02:00
OlivierDehaene	90184df79c	fix(layers): fix SuRotaryEmbedding (#2060 ) * fix(layers): fix SuRotaryEmbedding * change arange * remove logs	2024-06-12 18:24:47 +02:00
OlivierDehaene	521de6cacd	fix(server): fix OPT implementation (#2061 )	2024-06-12 18:22:20 +02:00
OlivierDehaene	9ac7b7bc52	remove slots from grpc	2024-06-12 11:50:31 +02:00
OlivierDehaene	c2fb459bc1	fix windowing	2024-06-11 18:40:38 +02:00
OlivierDehaene	37266e2dbb	fix rust and python unit-tests	2024-06-11 17:11:16 +02:00
drbh	376a0b7ada	Support chat response format (#2046 ) * feat: support response_format in chat * fix: adjust typos * fix: add trufflehog lint	2024-06-11 10:44:56 -04:00
fxmarty	a6e4d63c86	Update LLMM1 bound (#2050 ) update commit	2024-06-11 19:30:29 +08:00
OlivierDehaene	73c3903214	FlashCausalLM implem	2024-06-11 13:15:06 +02:00
OlivierDehaene	6983ec9537	small refactor	2024-06-11 13:15:06 +02:00
OlivierDehaene	713d70b443	re-working logic, wip	2024-06-11 13:15:06 +02:00
OlivierDehaene	298bf31e69	add terminated_generations	2024-06-11 13:15:06 +02:00
OlivierDehaene	3c596983ba	fix python tests	2024-06-11 13:15:06 +02:00
OlivierDehaene	51fa606875	fix	2024-06-11 13:15:05 +02:00
OlivierDehaene	35f27cbcc1	working example	2024-06-11 13:15:05 +02:00
OlivierDehaene	1cc86930a6	wip	2024-06-11 13:15:05 +02:00
OlivierDehaene	18e77a5cc7	wip	2024-06-11 13:15:05 +02:00
Luc Georges	dfca1dfc5e	fix(ci): remove unnecessary permissions (#2045 )	2024-06-10 12:16:53 -04:00
Luc Georges	4e74ec09a8	feat(ci): add trufflehog secrets detection (#2038 )	2024-06-10 11:54:13 -04:00
Daniël de Kok	85dfc39222	Add Phi-3 medium support (#2039 ) Add support for Phi-3-medium The main difference between the medium and mini models is that medium uses grouped query attention with a packed QKV matrix. This change adds support for GQA with packed matrixes to `Weights.get_weights_col_packed` and uses it for Phi-3. This also allows us to remove the custom implementation of GQA from dbrx attention loading.	2024-06-10 09:22:29 +02:00
fxmarty	9b3674d903	ROCm and sliding windows fixes (#2033 ) * update vllm commit & fix models using sliding window * update * update commit * fix bug where tunableop is bound to cuda graph even when cuda graph are disabled * enable tunableop by default * fix sliding window * address review * dead code * precise comment * is it flaky?	2024-06-10 15:09:50 +08:00
Daniël de Kok	bf3c813782	server: use chunked inputs The router will now send the input as chunks besides as a single string. This change modifies the server to process chunked input rather than strings. This also allows us to remove the image extraction code from the server.	2024-06-07 08:09:04 +02:00
Wang, Yi	4dabddb7ea	Xpu gqa (#2013 ) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-06-06 19:12:57 +02:00
Nicolas Patry	9765658212	Revert "Enabling CI for AMD with new runner.." This reverts commit `101ac9a760`.	2024-06-06 19:08:16 +02:00
Nicolas Patry	101ac9a760	Enabling CI for AMD with new runner..	2024-06-06 19:07:48 +02:00
Nicolas Patry	ed1cfde0d8	Internal runner ? (#2023 ) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->	2024-06-06 18:51:42 +02:00
Daniël de Kok	51621439a4	marlin: improve build	2024-06-06 17:19:46 +02:00
Daniël de Kok	0d96468ebb	marlin: support tp>1 when group_size==-1	2024-06-06 17:19:28 +02:00
Daniël de Kok	4594e6faba	Add support for Marlin-quantized models This change adds support for Marlin-quantized models. Marlin is an FP16xINT4 matmul kernel, which provides good speedups decoding batches of 16-32 tokens. It supports quantized models with symmetric quantization, groupsize -1 or 128, and 4-bit. Tested with: - Llama 2 - Llama 3 - Phi 3	2024-06-06 13:16:52 +02:00
Nicolas Patry	cf0d459aaf	Revert "Less cache misses on cargo build." This reverts commit `5aec4154c2`.	2024-06-06 10:33:55 +02:00
Nicolas Patry	5aec4154c2	Less cache misses on cargo build.	2024-06-06 10:33:01 +02:00
Andrés Marafioti	2a48a10043	Update __version__ on __init__.py to 0.7.0 (#2017 ) There was a new release of the python client with version upped to 0.7.0 on pip and on the pyproject.toml, but it wasn't changed on the __init__.py so when one does: ```python import text_generation print(text_generation.__version__) ``` It still outputs "0.6.0" # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->	2024-06-05 14:51:07 +02:00
Daniël de Kok	3f4bcf978c	Fix GPTQWeight import (#2020 ) # What does this PR do? Fix stray import. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->	2024-06-05 14:49:15 +02:00

1 2 3 4 5 ...

796 Commits All Branches Search

796 Commits

All Branches