History

Martin Iglesias Goyanes 9192de57cc Fixing frequency penalty (#1811 ) Thank you so much for the work you are doing, this is my little contribution to this great thing you have built. I hope it is useful and helpful, please don't hesitate to discuss any matters that are not clear! I am basing my implementation of frequency penalty on OpenAI's implementation: https://platform.openai.com/docs/guides/text-generation/parameter-details The problem I see with TGI's current implementation is that is not taking into account the frequency of tokens which have already been sampled in the current generation stream. Also, the scaling is of the adjusted token logits is done differently for positive and negative logits. While in OpenAI's implementation token frequency is taking into account and the scaling is always done with a subtraction (if penalty is positive) or add operation (if penalty is negative). This leads to corrupt generations as I mentioned in issue #1810 . Moreover, after my tests, other issues are also gone like the one about some request's with ``penalty_frequency = 1.0`` overruling other requests (with ``frequency_penalty = 0.0``) in the same batch and therefore corrupting all generations in the batch. Basically, padding does not affect this implementation so I believe this ``score *= input_ids.ne(0)`` is not needed anymore. Frequency penalty \| -1.0 \| 0.0 \| 1.0 -- \| -- \| -- \| -- Before my change \| https://paste.mozilla.org/JxqGJkWY \| https://paste.mozilla.org/hrztJ56h \| https://paste.mozilla.org/pBSEH2zw After my change \| https://paste.mozilla.org/7gXCi7zo \| https://paste.mozilla.org/ZR9rJ92g \| https://paste.mozilla.org/gHaD2YnC --------- Co-authored-by: martini <martin.iglesiasgoyanes@adyen.com>		2024-04-30 12:13:23 +02:00
..
custom_kernels	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
exllama_kernels	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
exllamav2_kernels	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
tests	feat(server): add frequency penalty (#1541 )	2024-02-08 18:41:25 +01:00
text_generation_server	Fixing frequency penalty (#1811 )	2024-04-30 12:13:23 +02:00
.gitignore	Impl simple mamba model (#1480 )	2024-02-08 10:19:45 +01:00
Makefile	fix: fix CohereForAI/c4ai-command-r-plus (#1707 )	2024-04-10 17:20:25 +02:00
Makefile-awq	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-eetq	Upgrade EETQ (Fixes the cuda graphs). (#1729 )	2024-04-12 08:15:28 +02:00
Makefile-flash-att	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-flash-att-v2	fix: fix CohereForAI/c4ai-command-r-plus (#1707 )	2024-04-10 17:20:25 +02:00
Makefile-selective-scan	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-vllm	fix: fix CohereForAI/c4ai-command-r-plus (#1707 )	2024-04-10 17:20:25 +02:00
README.md	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
poetry.lock	Upgrading all versions. (#1759 )	2024-04-18 17:17:40 +02:00
pyproject.toml	v2.0.1	2024-04-18 17:20:36 +02:00
requirements_cuda.txt	Upgrading all versions. (#1759 )	2024-04-18 17:17:40 +02:00
requirements_rocm.txt	Upgrading all versions. (#1759 )	2024-04-18 17:17:40 +02:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev