* feat: support phi3.5 moe model loading
* fix: prefer llama base model and improve rotary logic
* feat: return reasonable generation and add integration test
* fix: run lint and update docs
* fix: rerun lint for openapi docs
* fix: prefer do_sample false unless temp is set by user, and update chat tests
* fix: small typo adjustments
* fix: consolidate long rope paths
* fix: revert greedy by default and test changes
* Vendor configuration so that we don't have to `trust_remote_code`
* Use SparseMoELayer
* Add support for dense MoE
* Some type annotations
* Add the usual model tests
* Ruff.
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
This PR makes tool calling aware of the name of the function selected.
Fixes:
https://github.com/huggingface/text-generation-inference/issues/1657
Thank you @puppetm4st3r for the helpful snippets, large parts of this PR
are simply refactors of the code shared 🙏
**opening draft PR because small tweaks are needed before merging
This PR resolves a couple
- [X] adjusts the tool response to align with openai's tools response
type
- [X] bumps pydantic to `2.6.4` in all apps (resolves dependency issue
when running tests)
- [X] bump `outlines` version and fix import for new name
This work in progress PR begins to add support for tools. Tools relies
on grammar support and still has some unsolved challenges. Opening the
PR for visibility and feedback