Commit Graph

1026 Commits

Author SHA1 Message Date
Nicolas Patry 5838f2139f
Tied embeddings in MLP speculator. 2024-08-29 12:30:26 +02:00
Nicolas Patry 5e2932552c
Revert the Cohere tokenizer change (for now using a revision instead). 2024-08-29 11:35:18 +02:00
Nicolas Patry fc7ea202c2
Fix disabling prefix caching - Fix windowing checks. 2024-08-29 11:34:50 +02:00
Nicolas Patry bef2f6bdaa
Fixing the free algorithm to handle times where the common prefix is
smaller.
2024-08-29 09:17:00 +02:00
Nicolas Patry 9c839ca5df
Adding error message when assert is violated. 2024-08-28 21:22:36 +02:00
Nicolas Patry e7e036389e
Revert the integrationt tests change (seem linked to head_size
modification).
2024-08-28 19:38:51 +02:00
Nicolas Patry 8a4df6e181
Only n_heads / process_group.size() are necessary. 2024-08-28 16:34:58 +02:00
Nicolas Patry 8d01848370
Update server tests
- Default to throughput test in k6
- Use TGI_WIGGLE_ROOM to adjust wiggle room
2024-08-28 15:42:05 +02:00
Nicolas Patry 12325564dc
Put back default pure shell. 2024-08-28 14:54:05 +02:00
Nicolas Patry f886747949
Oops this doesn't belong here. 2024-08-28 14:49:00 +02:00
Nicolas Patry e6ee67f301
Truncating left for radix purposes. 2024-08-28 10:53:22 +02:00
Nicolas Patry 0a60973166
Fixing the batching tokenization in flash causal lm. 2024-08-28 10:34:10 +02:00
Nicolas Patry c6f1a61267
Update the chat test. 2024-08-27 23:02:12 +02:00
Nicolas Patry 8ac1ffa087
Removing encoder_decoder (seq2seq). 2024-08-27 21:11:49 +02:00
Nicolas Patry ccaf1d0030
Fixing the test. 2024-08-27 20:06:12 +02:00
Nicolas Patry 2cf1f5c00e
Fixing the issue with `add_special_tokens` not being passed around. 2024-08-27 20:06:12 +02:00
Nicolas Patry e0069a3a26
Fixing seqlen with the new vlms. 2024-08-27 20:06:12 +02:00
Nicolas Patry 9dacac3b15
add_special_tokens is internal only 2024-08-27 20:06:12 +02:00
Nicolas Patry 55d984d730
Fixed flashinfer version. 2024-08-27 20:06:12 +02:00
Nicolas Patry bb9769ed42
Update all models. 2024-08-27 20:06:11 +02:00
Nicolas Patry 65b94a69bd
Fixing prefix caching for flashdecoding. 2024-08-27 20:06:11 +02:00
Nicolas Patry 7f1816a4e1
Change `add_special_tokens` in order to have the correct tokens for chat
input and not (since it's super important with the prefixing now)
2024-08-27 20:06:11 +02:00
Nicolas Patry f1c0735453
Don't enable prefix caching on VLM just yet. 2024-08-27 20:06:11 +02:00
Nicolas Patry e30fb25444
Fixing the default for vlm. 2024-08-27 20:06:11 +02:00
Nicolas Patry 27b566baa8
Downgrade some logs. 2024-08-27 20:06:11 +02:00
Nicolas Patry 26e5037de4
This seems to be working. 2024-08-27 20:06:10 +02:00
Nicolas Patry f5182c188c
Is this enough to make it work ? 2024-08-27 20:06:10 +02:00
Nicolas Patry 1568e82548
OVerride the env in server tests. 2024-08-27 20:06:10 +02:00
Nicolas Patry 682db34b6a
Handling debugger. 2024-08-27 20:06:10 +02:00
Nicolas Patry c53968dc45
Remove lambda for cleaner function. 2024-08-27 20:06:10 +02:00
Nicolas Patry 32f6416358
Upgrade resolution system for less errors in resolution. 2024-08-27 20:06:10 +02:00
Nicolas Patry 5eb6ea0063
Tmp 2024-08-27 20:06:09 +02:00
Nicolas Patry 0bf4eb9683
Updated flake lock 2024-08-27 20:06:09 +02:00
Nicolas Patry b80593bfa3
Apply suggestions from code review
Co-authored-by: drbh <david.richard.holtz@gmail.com>
2024-08-27 20:06:09 +02:00
Nicolas Patry 8d0220a695
Forgot last default place. 2024-08-27 20:06:09 +02:00
Nicolas Patry 860b550cdf
Everywhere 1.80 2024-08-27 20:06:09 +02:00
Nicolas Patry 344fee0d44
Upgrade to 1.80 because of bitstream... 2024-08-27 20:06:09 +02:00
Nicolas Patry 17c8a5e574
Update cargo lock ? 2024-08-27 20:06:06 +02:00
Nicolas Patry ba1ce20ce8
Updating integration tests with new values with FI/FD.
Remove paged as a default too, and using FD everywhere.
2024-08-27 20:05:29 +02:00
Nicolas Patry ffb6841121
Update lock 2024-08-27 20:05:29 +02:00
Nicolas Patry f0b35f94b8
More specific codes. 2024-08-27 20:05:29 +02:00
Nicolas Patry a6cd5fef23
Disable prefix caching for lora. 2024-08-27 20:05:29 +02:00
Nicolas Patry cba59aca03
Disabling flashinfer/prefix caching on odd head_dim 2024-08-27 20:05:29 +02:00
Nicolas Patry f55278de2d
Allowing window_left_size (dummy version). 2024-08-27 20:05:29 +02:00
Nicolas Patry f2bdc65098
Using prebuilt. 2024-08-27 20:05:28 +02:00
Nicolas Patry 9d4c5d39fe
Include flashinfer in the docker. 2024-08-27 20:05:28 +02:00
Nicolas Patry 60719babf6
Making prefix/flashinfer the default and testing the full release tests. 2024-08-27 20:05:28 +02:00
drbh 21187c27c9
fix: bump minijinja version and add test for llama 3.1 tools (#2463)
* fix: support tojson and avoid message indexing issue in template

* fix: prefer minijinja native methods and prefer workspace level dependency

* fix: adjust comment typo
2024-08-27 13:31:08 -04:00
Nicolas Patry 2788d41a76
Fixing CI. (#2462) 2024-08-27 15:33:02 +02:00
drbh cfa73b5c99
Pr 2451 ci branch (#2454)
* fix[router]: Fix tools not passed in chat template

Signed-off-by: GitHub <noreply@github.com>

* feat: improve default tool serialization and lints

* feat: refactor tool logic to include notify_error in prompt and adjust typing

* fix: adjust non tool template apply

* fix: simplify tool grammar logic and improve schema

* feat: avoid skip tool test and avoid empty tool prompts

* fix: increase test client timeout for grammar compilation tests

---------

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Simone Rossi <simone.rossi.93@gmail.com>
2024-08-26 20:19:38 -04:00