Commit Graph

144 Commits

Author SHA1 Message Date
OlivierDehaene 9c1cb81cd8
v1.4.2 (#1585) 2024-02-21 14:50:57 +01:00
OlivierDehaene fa8a8e05af
fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
OlivierDehaene 4139054b82
v1.4.1 (#1568) 2024-02-16 17:50:57 +01:00
OlivierDehaene 9946165ee0
chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
drbh cef0553d59
Outlines guided generation (#1539)
This WIP PR starts to add grammar support via outlines, currently this
PR supports very simple regex grammars and does not optimize for
precompiling or caching grammar fsm's.

todo:
- [X] add simple outlines guidance to `NextTokenChooser`
- [X] update protos for grammar
- [X] update generation params API
- [X] constrain simple grammar
- [ ] support parsing more complex grammar into fsm
- [ ] support all outline support grammar types
- [ ] explore optimizations to avoid recompiling grammars

guided request
```bash
curl -s 'http://localhost:3000/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": "make an email for david: \n",
    "parameters": {
        "max_new_tokens": 6,
        "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
    }
}' | jq
```
response
```json
{
  "generated_text": "david@example.com"
}
```

unguided request
```bash
curl -s 'http://localhost:3000/generate' \
--header 'Content-Type: application/json' \
--data '{
    "inputs": "make an email for david: \n",
    "parameters": {
        "max_new_tokens": 6
    }
}' | jq
```
response
```json
{
  "generated_text": "    email = 'david"
}
```
2024-02-15 10:28:10 +01:00
OlivierDehaene 0d794af6a5
feat: experimental support for cuda graphs (#1428)
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-02-12 10:09:29 +01:00
OlivierDehaene 532146338b
feat(router): add max_batch_size (#1542)
Some hardware require a maximum batch size.
2024-02-09 12:38:41 +01:00
drbh 0da00be52c
feat: add ie update to message docs (#1523)
update messages api docs and add Hugging Face Inference Endpoints
integrations section/instructions

---------

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2024-02-02 16:31:11 +01:00
Pedro Cuenca 3ab578b416
[docs] Fix link to Install CLI (#1526)
# What does this PR do?

Attempts to fix a link from Using TGI CLI to Installation.


## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
2024-02-02 14:05:30 +01:00
drbh 2ae36a97fd
fix: improve messages api docs content and formatting (#1506)
This PR simply updates the messages api docs to address content changes
and make format consistent
2024-01-31 17:26:22 +01:00
OlivierDehaene c2d4a3b5c7
v1.4.0 (#1494) 2024-01-26 19:04:57 +01:00
drbh d9758851be
feat: add tokenizer-config-path to launcher args (#1495)
This PR adds the `tokenizer-config-path` to the launcher and passes it
to the router

Fixes:
https://github.com/huggingface/text-generation-inference/pull/1427
2024-01-26 18:01:33 +01:00
fxmarty 650fea1834
GPTQ support on ROCm (#1489)
Tested with
```
CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq
EXLLAMA_VERSION=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq
CUDA_VISIBLE_DEVICES="0,1" text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq
```

all with good and identical results on MI210.

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2024-01-26 16:27:44 +01:00
Nicolas Patry ebecc06161
Update the docs to include newer models. (#1492) 2024-01-26 16:07:31 +01:00
Nicolas Patry 16958fe312
fix: launcher doc typos (#1473)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

---------

Co-authored-by: Andres Restrepo <andres@thelinuxkid.com>
2024-01-26 10:41:58 +01:00
Nicolas Patry 17b7b75e65 Update the docs 2024-01-26 10:13:23 +01:00
Nicolas Patry 86c8335f1b
Add a new `/tokenize` route to get the tokenized input (#1471)
# What does this PR do?


Ideally this is done client side, but this is a recurring request,
therefore we implemented it.

- Runs only if rust tokenizer is present (not encumbering the main
inference pipeline is important).
- Returns simple results, ID, text (gotten with offsets from the
original string) and offsets (so users can do things like highlighting
text).

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2024-01-25 14:19:03 +01:00
drbh 7872b8c55b
Add messages api compatibility docs (#1478)
This PR adds a new page to the docs that describes the Messages API and
how to use it.

Additionally this page will contain cloud provider specific information
for enabling and using this feature. This PR includes a SageMaker
example/information.
2024-01-24 11:41:28 -05:00
OlivierDehaene 3f9b3f4539 docs: update required CUDA version to 12.2 2024-01-09 14:28:55 +01:00
OlivierDehaene 630800eed3 v1.3.4 2023-12-22 15:46:04 +01:00
regisss 987c959f73
docs: Change URL for Habana Gaudi support in doc (#1343) 2023-12-21 11:05:35 +01:00
OlivierDehaene f3aea78fb6 v1.3.3 2023-12-15 01:20:42 +01:00
OlivierDehaene 28821bfd5d fix: default max_new_tokens to 100 2023-12-13 09:19:19 +01:00
OlivierDehaene 88aae2595d v1.3.2 2023-12-12 18:10:22 +01:00
OlivierDehaene ec6d4592d5 v1.3.1 2023-12-11 16:46:44 +01:00
OlivierDehaene d0841cc8eb v1.3.0 2023-12-11 14:55:03 +01:00
Nicolas Patry 9ecfa16b12
Speculative (#1308) 2023-12-11 12:46:30 +01:00
fxmarty 25b5f81941
Fix AMD documentation (#1307)
As per title
2023-12-04 22:09:51 +09:00
OlivierDehaene ccd5725a0c v1.2.0 2023-11-30 15:18:15 +01:00
fxmarty b2b5df0e94
Add RoCm support (#1243)
This PR adds support for AMD Instinct MI210 & MI250 GPUs, with paged
attention and FAv2 support.

Remaining items to discuss, on top of possible others:
* Should we have a
`ghcr.io/huggingface/text-generation-inference:1.1.0+rocm` hosted image,
or is it too early?
* Should we set up a CI on MI210/MI250? I don't have access to the
runners of TGI though.
* Are we comfortable with those changes being directly in TGI, or do we
need a fork?

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
Co-authored-by: Your Name <you@example.com>
2023-11-27 14:08:12 +01:00
Omar Sanseviero a5def7c222
Fix link in quantization guide (#1246) 2023-11-08 17:34:38 +01:00
Aastha Varma 63fa534612
Fix link to quantization page in preparing_model.md (#1187) 2023-10-23 12:12:21 +02:00
Mishig 3af1a11401
Fix link in preparing_model.md (#1140)
Fixes a link in doc
2023-10-13 09:48:35 +02:00
Omar Sanseviero dd304cf14c
Remove some content from the README in favour of the documentation (#958) 2023-10-09 11:59:06 +02:00
Nicolas Patry 00b8f36fba
Prepare for v1.1.1 (#1100)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-10-05 16:09:49 +02:00
Nicolas Patry 6df43da0a4
Modify the default for `max_new_tokens`. (#1097)
# What does this PR do?

Now clients which do not specify a max_length will be implying
`max_new_tokens = max_total_tokens - input_length`.
This is a serious change, but which seems more in line with what users
expect from standing server.

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

---------

Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-10-04 17:38:42 +02:00
Nicolas Patry 8ec1b87f16
Adding titles to CLI doc. (#1094)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-10-04 12:57:21 +02:00
Fluder-Paradyne b4f68c3cf4
fixed command line arguments in docs (#1092)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Just removed `--` from the arguments. 
With `--` bitsandbytes and bitsandbytes-nf4 are considered an option
which they are not

## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-10-03 12:25:45 +02:00
Mishig 702d269729
[Doc page] Fix launcher page highlighting (#1080)
### Broken highlighting (current)

<img width="800" alt="Screenshot 2023-09-28 at 22 38 15"
src="https://github.com/huggingface/text-generation-inference/assets/11827707/1f07c356-2c3c-4ff0-8ca5-54a032b05d48">

### Fixed highlighting (this PR)

<img width="800" alt="image"
src="https://github.com/huggingface/text-generation-inference/assets/11827707/87fe750d-c26e-4801-95cc-86859a2df52d">
2023-10-03 11:11:10 +02:00
Mishig 724199aaf1
Update launcher.md to wrap code blocks (#1076)
Wrap code blocks in `launcher` doc page

using https://github.com/huggingface/doc-builder/pull/420


https://moon-ci-docs.huggingface.co/docs/text-generation-inference/pr_1076/en/basic_tutorials/launcher

<img width="800" alt="image"
src="https://github.com/huggingface/text-generation-inference/assets/11827707/cb240198-411f-4d22-9f6e-8f70f2c6dcab">
2023-09-28 17:30:36 +02:00
Mishig a7808ff853
Fix launcher.md (#1075)
Adding a new line to escape between heading and codeblock. However, it
is a hotfix and I will work on a permanent solution on
https://github.com/huggingface/doc-builder
2023-09-28 15:37:50 +02:00
OlivierDehaene 7a6fad6aac update readme 2023-09-28 10:18:18 +02:00
OlivierDehaene 3b56d7669b
feat: add mistral model (#1071) 2023-09-28 09:55:47 +02:00
Merve Noyan 259a230028
Automatic docs for TGI (#1045)
I had to open this PR since I initially worked from my fork, and it
requires a handful of work to trigger a new github action on my fork's
specific branch (couldn't find a way, at least, despite trying all of
them).

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-09-27 16:01:38 +02:00
Merve Noyan 36c2868853
Added note on weight-cache-override (#994)
Added note on serving supported models from a different folder without
re-downloading them.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-09-27 11:06:07 +02:00
Nicolas Patry a049864270
Preping 1.1.0 (#1066)
# What does this PR do?

Upgrade all relevant versions and dependencies.

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-09-27 10:40:18 +02:00
Nicolas Patry c5de7cd886
Add AWQ quantization inference support (#1019) (#1054)
# Add AWQ quantization inference support

Fixes
https://github.com/huggingface/text-generation-inference/issues/781

This PR (partially) adds support for AWQ quantization for inference.
More information on AWQ [here](https://arxiv.org/abs/2306.00978). In
general, AWQ is faster and more accurate than GPTQ, which is currently
supported by TGI.

This PR installs 4-bit GEMM custom CUDA kernels released by AWQ authors
(in `requirements.txt`, just one line change).

Quick way to test this PR would be bring up TGI as follows:

```
text-generation-server download-weights abhinavkulkarni/codellama-CodeLlama-7b-Python-hf-w4-g128-awq

text-generation-launcher \
--huggingface-hub-cache ~/.cache/huggingface/hub/ \
--model-id abhinavkulkarni/codellama-CodeLlama-7b-Python-hf-w4-g128-awq \
--trust-remote-code --port 8080 \
--max-input-length 2048 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 \
--quantize awq
```

Please note:
* This PR was tested with FlashAttention v2 and vLLM.
* This PR adds support for AWQ inference, not quantizing the models.
That needs to be done outside of TGI, instructions

[here](f084f40bd9).
* This PR only adds support for `FlashLlama` models for now.
* Multi-GPU setup has not been tested. 
* No integration tests have been added so far, will add later if
maintainers are interested in this change.
* This PR can be tested on any of the models released

[here](https://huggingface.co/abhinavkulkarni?sort_models=downloads#models).

Please refer to the linked issue for benchmarks for

[abhinavkulkarni/meta-llama-Llama-2-7b-chat-hf-w4-g128-awq](https://huggingface.co/abhinavkulkarni/meta-llama-Llama-2-7b-chat-hf-w4-g128-awq)
vs

[TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ).

Please note, AWQ has released faster (and in case of Llama, fused)
kernels for 4-bit GEMM, currently at the top of the `main` branch at
https://github.com/mit-han-lab/llm-awq, but this PR uses an older commit
that has been tested to work. We can switch to latest commit later on.

## Who can review?

@OlivierDehaene OR @Narsil

---------



# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

---------

Co-authored-by: Abhinav M Kulkarni <abhinavkulkarni@gmail.com>
Co-authored-by: Abhinav Kulkarni <abhinav@concentric.ai>
2023-09-25 15:31:27 +02:00
Merve Noyan c8a01d7591
Unsupported model serving docs (#906)
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-09-12 15:55:14 +02:00
Merve Noyan e9ae678699
Quantization docs (#911)
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-09-12 15:52:46 +02:00
Merve Noyan 1f69fb9ed4
Tensor Parallelism conceptual guide (#886)
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-09-12 12:11:20 +02:00
Merve Noyan 30a93a0dec
Paged Attention Conceptual Guide (#901) 2023-09-08 14:18:42 +02:00
Merve Noyan af1ed38f39
Safetensors conceptual guide (#905)
IDK what else to add in this guide, I looked for relevant code in TGI
codebase and saw that it's used in quantization as well (maybe I could
add that?)
2023-09-07 16:22:06 +02:00
Omar Sanseviero a9fdfb2464
docs: Remove redundant content from stream guide (#884)
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-09-06 18:42:42 +02:00
Merve Noyan f260eb72f9
docs: Flash Attention Conceptual Guide (#892)
PR for conceptual guide on flash attention. I will add more info unless
I'm told otherwise.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
2023-09-06 15:36:49 +02:00
Julien Bouquillon 3ed4c0f33f
docs: typo in streaming.js (#971)
Looks like an error
2023-09-06 14:57:59 +02:00
Omar Sanseviero 7d8e5fb284
Update version in docs (#957) 2023-08-31 20:00:12 +02:00
Merve Noyan 97444f9367
Added gradio example to docs (#867)
cc @osanseviero

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
2023-08-23 23:50:12 +02:00
Nicolas Patry 888c029114
Upgrade version number in docs. (#910)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-08-23 13:45:28 +02:00
Omar Sanseviero bfa070611d
Add streaming guide (#858)
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
2023-08-18 13:27:08 +02:00
Omar Sanseviero d9bceb8e6b
Misc improvements for InferenceClient docs (#852)
List of changes

- No need to specify `model` in `text_generation` if it's already
specified in `InferenceClient`
- I separated the explanation of `stream=True` and `details=True`
- I found the details explanation a bit repetitive (it says two times
what it returns), so removed a sentence
- Add mention of async client
2023-08-16 14:29:54 +02:00
Omar Sanseviero d71237fc8b
Have snippets in Python/JavaScript in quicktour (#809)
![Screenshot from 2023-08-10
14-20-25](https://github.com/huggingface/text-generation-inference/assets/7246357/e16d0d41-be63-4d06-8093-30540df91419)

---------

Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
2023-08-14 13:47:32 +02:00
Nicolas Patry 09eca64227
Version 1.0.1 (#836)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-08-14 11:23:11 +02:00
Merve Noyan a2a913eec5
Added streaming for InferenceClient (#821)
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
2023-08-11 18:05:19 +03:00
Merve Noyan d0e30771c2
Added ChatUI Screenshot to Docs (#823)
cc @osanseviero
2023-08-11 16:42:43 +02:00
Merve Noyan e58ad6dd66
Added CLI docs (#799)
Added docs for CLI
2023-08-10 15:00:30 +02:00
Omar Sanseviero 7dbaef3f5b
Minor docs style fixes (#806) 2023-08-10 15:32:51 +03:00
Omar Sanseviero 04f7c2d86b
Fix gated docs (#805) 2023-08-10 15:32:07 +03:00
Merve Noyan 647ae7a7d3
Setup for doc-builder and docs for TGI (#740)
I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil
@osanseviero

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: osanseviero <osanseviero@gmail.com>
Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>
2023-08-10 10:24:52 +02:00
OlivierDehaene 3ef5ffbc64
v1.0.0 (#727) 2023-07-28 17:43:46 +02:00
OlivierDehaene 9f18f4c006
v0.9.4 (#713) 2023-07-27 19:25:15 +02:00
OlivierDehaene cf83f9b66f
v0.9.3 (#634) 2023-07-18 18:11:20 +02:00
OlivierDehaene c58a0c185b
v0.9.2 (#616) 2023-07-14 16:31:48 +02:00
OlivierDehaene 31b36cca21
v0.9.1 (#558) 2023-07-06 16:05:42 +02:00
OlivierDehaene e28a809004
v0.9.0 (#525) 2023-07-01 19:25:41 +02:00
OlivierDehaene 19c41824cb chore: update openapi schema 2023-06-05 18:16:08 +02:00
OlivierDehaene e7248fe90e v0.8.2 2023-06-01 19:49:13 +02:00
OlivierDehaene db2ebe3947 v0.8.1 2023-05-31 12:08:40 +02:00
OlivierDehaene 081b926584 v0.8.0 2023-05-30 18:39:35 +02:00
OlivierDehaene d31562f300
v0.7.0 (#353) 2023-05-23 21:20:49 +02:00
OlivierDehaene e250282213
feat(docker): add benchmarking tool to docker image (#298) 2023-05-09 13:19:31 +02:00
OlivierDehaene 6ded76a4ae
v0.6.0 (#222) 2023-04-21 21:00:57 +02:00
OlivierDehaene 2475aede61
feat(router): add info route (#196)
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene 6f0f1d70f6
v0.5.0 (#168) 2023-04-11 20:32:18 +02:00
OlivierDehaene fef1a1c381
v0.4.3 (#152) 2023-03-30 17:28:14 +02:00
OlivierDehaene 84722f3e33
v0.4.2 (#151) 2023-03-30 17:10:01 +02:00
OlivierDehaene ab5fd8cf93
v0.4.1 (#140) 2023-03-26 16:37:51 +02:00
OlivierDehaene 411d6247f4
v0.4.0 (#119) 2023-03-09 16:07:01 +01:00
OlivierDehaene 55bd4fed7d
feat(router): add best_of parameter (#117) 2023-03-09 15:30:54 +01:00
OlivierDehaene 1c19b0934e
v0.3.2 (#97) 2023-03-03 18:42:20 +01:00
OlivierDehaene 3b03c4ea18
fix(docs): fix openapi schema (#86) 2023-02-24 15:59:49 +01:00
OlivierDehaene c720555adc
v0.3.0 (#72) 2023-02-16 17:28:29 +01:00
Yannic Kilcher e520d5b349
fixed SSE naming (#61)
https://en.wikipedia.org/wiki/Server-sent_events
2023-02-08 22:30:11 +01:00
OlivierDehaene 2fe5e1b30e
V0.2.1 (#58) 2023-02-07 15:40:25 +01:00
OlivierDehaene 20c3c5940c
feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00