Fix exllama wronfully loading (#990)
# What does this PR do? The [changes](https://github.com/huggingface/text-generation-inference/pull/986/files#diff-b72e45030214e50c8ff6e3be837057b3f3368b9779fd942ca680f949fe069eafR176) disabling exllama on old compute had unintended consequences of not setting `use_exllama` to `False` if `HAS_EXLLAMA` equals `False` **and** `CAN_EXLLAMA` equals `False`. This fixes this. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? @OlivierDehaene @Narsil Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->
This commit is contained in:
parent
a9fdfb2464
commit
935a77fb74
|
@ -173,10 +173,11 @@ class Weights:
|
||||||
from text_generation_server.utils.layers import HAS_EXLLAMA, CAN_EXLLAMA
|
from text_generation_server.utils.layers import HAS_EXLLAMA, CAN_EXLLAMA
|
||||||
|
|
||||||
if use_exllama:
|
if use_exllama:
|
||||||
if not HAS_EXLLAMA and CAN_EXLLAMA:
|
if not HAS_EXLLAMA:
|
||||||
logger.warning(
|
if CAN_EXLLAMA:
|
||||||
"Exllama GPTQ cuda kernels (which are faster) could have been used, but are not currently installed, try using BUILD_EXTENSIONS=True"
|
logger.warning(
|
||||||
)
|
"Exllama GPTQ cuda kernels (which are faster) could have been used, but are not currently installed, try using BUILD_EXTENSIONS=True"
|
||||||
|
)
|
||||||
use_exllama = False
|
use_exllama = False
|
||||||
else:
|
else:
|
||||||
logger.info("Using exllama kernels")
|
logger.info("Using exllama kernels")
|
||||||
|
|
Loading…
Reference in New Issue