Nicolas Patry
|
5838f2139f
|
Tied embeddings in MLP speculator.
|
2024-08-29 12:30:26 +02:00 |
Nicolas Patry
|
5e2932552c
|
Revert the Cohere tokenizer change (for now using a revision instead).
|
2024-08-29 11:35:18 +02:00 |
Nicolas Patry
|
fc7ea202c2
|
Fix disabling prefix caching - Fix windowing checks.
|
2024-08-29 11:34:50 +02:00 |
Nicolas Patry
|
bef2f6bdaa
|
Fixing the free algorithm to handle times where the common prefix is
smaller.
|
2024-08-29 09:17:00 +02:00 |
Nicolas Patry
|
9c839ca5df
|
Adding error message when assert is violated.
|
2024-08-28 21:22:36 +02:00 |
Nicolas Patry
|
e7e036389e
|
Revert the integrationt tests change (seem linked to head_size
modification).
|
2024-08-28 19:38:51 +02:00 |
Nicolas Patry
|
8a4df6e181
|
Only n_heads / process_group.size() are necessary.
|
2024-08-28 16:34:58 +02:00 |
Nicolas Patry
|
8d01848370
|
Update server tests
- Default to throughput test in k6
- Use TGI_WIGGLE_ROOM to adjust wiggle room
|
2024-08-28 15:42:05 +02:00 |
Nicolas Patry
|
12325564dc
|
Put back default pure shell.
|
2024-08-28 14:54:05 +02:00 |
Nicolas Patry
|
f886747949
|
Oops this doesn't belong here.
|
2024-08-28 14:49:00 +02:00 |
Nicolas Patry
|
e6ee67f301
|
Truncating left for radix purposes.
|
2024-08-28 10:53:22 +02:00 |
Nicolas Patry
|
0a60973166
|
Fixing the batching tokenization in flash causal lm.
|
2024-08-28 10:34:10 +02:00 |
Nicolas Patry
|
c6f1a61267
|
Update the chat test.
|
2024-08-27 23:02:12 +02:00 |
Nicolas Patry
|
8ac1ffa087
|
Removing encoder_decoder (seq2seq).
|
2024-08-27 21:11:49 +02:00 |
Nicolas Patry
|
ccaf1d0030
|
Fixing the test.
|
2024-08-27 20:06:12 +02:00 |
Nicolas Patry
|
2cf1f5c00e
|
Fixing the issue with `add_special_tokens` not being passed around.
|
2024-08-27 20:06:12 +02:00 |
Nicolas Patry
|
e0069a3a26
|
Fixing seqlen with the new vlms.
|
2024-08-27 20:06:12 +02:00 |
Nicolas Patry
|
9dacac3b15
|
add_special_tokens is internal only
|
2024-08-27 20:06:12 +02:00 |
Nicolas Patry
|
55d984d730
|
Fixed flashinfer version.
|
2024-08-27 20:06:12 +02:00 |
Nicolas Patry
|
bb9769ed42
|
Update all models.
|
2024-08-27 20:06:11 +02:00 |
Nicolas Patry
|
65b94a69bd
|
Fixing prefix caching for flashdecoding.
|
2024-08-27 20:06:11 +02:00 |
Nicolas Patry
|
7f1816a4e1
|
Change `add_special_tokens` in order to have the correct tokens for chat
input and not (since it's super important with the prefixing now)
|
2024-08-27 20:06:11 +02:00 |
Nicolas Patry
|
f1c0735453
|
Don't enable prefix caching on VLM just yet.
|
2024-08-27 20:06:11 +02:00 |
Nicolas Patry
|
e30fb25444
|
Fixing the default for vlm.
|
2024-08-27 20:06:11 +02:00 |
Nicolas Patry
|
27b566baa8
|
Downgrade some logs.
|
2024-08-27 20:06:11 +02:00 |
Nicolas Patry
|
26e5037de4
|
This seems to be working.
|
2024-08-27 20:06:10 +02:00 |
Nicolas Patry
|
f5182c188c
|
Is this enough to make it work ?
|
2024-08-27 20:06:10 +02:00 |
Nicolas Patry
|
1568e82548
|
OVerride the env in server tests.
|
2024-08-27 20:06:10 +02:00 |
Nicolas Patry
|
682db34b6a
|
Handling debugger.
|
2024-08-27 20:06:10 +02:00 |
Nicolas Patry
|
c53968dc45
|
Remove lambda for cleaner function.
|
2024-08-27 20:06:10 +02:00 |
Nicolas Patry
|
32f6416358
|
Upgrade resolution system for less errors in resolution.
|
2024-08-27 20:06:10 +02:00 |
Nicolas Patry
|
5eb6ea0063
|
Tmp
|
2024-08-27 20:06:09 +02:00 |
Nicolas Patry
|
0bf4eb9683
|
Updated flake lock
|
2024-08-27 20:06:09 +02:00 |
Nicolas Patry
|
b80593bfa3
|
Apply suggestions from code review
Co-authored-by: drbh <david.richard.holtz@gmail.com>
|
2024-08-27 20:06:09 +02:00 |
Nicolas Patry
|
8d0220a695
|
Forgot last default place.
|
2024-08-27 20:06:09 +02:00 |
Nicolas Patry
|
860b550cdf
|
Everywhere 1.80
|
2024-08-27 20:06:09 +02:00 |
Nicolas Patry
|
344fee0d44
|
Upgrade to 1.80 because of bitstream...
|
2024-08-27 20:06:09 +02:00 |
Nicolas Patry
|
17c8a5e574
|
Update cargo lock ?
|
2024-08-27 20:06:06 +02:00 |
Nicolas Patry
|
ba1ce20ce8
|
Updating integration tests with new values with FI/FD.
Remove paged as a default too, and using FD everywhere.
|
2024-08-27 20:05:29 +02:00 |
Nicolas Patry
|
ffb6841121
|
Update lock
|
2024-08-27 20:05:29 +02:00 |
Nicolas Patry
|
f0b35f94b8
|
More specific codes.
|
2024-08-27 20:05:29 +02:00 |
Nicolas Patry
|
a6cd5fef23
|
Disable prefix caching for lora.
|
2024-08-27 20:05:29 +02:00 |
Nicolas Patry
|
cba59aca03
|
Disabling flashinfer/prefix caching on odd head_dim
|
2024-08-27 20:05:29 +02:00 |
Nicolas Patry
|
f55278de2d
|
Allowing window_left_size (dummy version).
|
2024-08-27 20:05:29 +02:00 |
Nicolas Patry
|
f2bdc65098
|
Using prebuilt.
|
2024-08-27 20:05:28 +02:00 |
Nicolas Patry
|
9d4c5d39fe
|
Include flashinfer in the docker.
|
2024-08-27 20:05:28 +02:00 |
Nicolas Patry
|
60719babf6
|
Making prefix/flashinfer the default and testing the full release tests.
|
2024-08-27 20:05:28 +02:00 |
drbh
|
21187c27c9
|
fix: bump minijinja version and add test for llama 3.1 tools (#2463)
* fix: support tojson and avoid message indexing issue in template
* fix: prefer minijinja native methods and prefer workspace level dependency
* fix: adjust comment typo
|
2024-08-27 13:31:08 -04:00 |
Nicolas Patry
|
2788d41a76
|
Fixing CI. (#2462)
|
2024-08-27 15:33:02 +02:00 |
drbh
|
cfa73b5c99
|
Pr 2451 ci branch (#2454)
* fix[router]: Fix tools not passed in chat template
Signed-off-by: GitHub <noreply@github.com>
* feat: improve default tool serialization and lints
* feat: refactor tool logic to include notify_error in prompt and adjust typing
* fix: adjust non tool template apply
* fix: simplify tool grammar logic and improve schema
* feat: avoid skip tool test and avoid empty tool prompts
* fix: increase test client timeout for grammar compilation tests
---------
Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Simone Rossi <simone.rossi.93@gmail.com>
|
2024-08-26 20:19:38 -04:00 |