Nicolas Patry
682db34b6a
Handling debugger.
2024-08-27 20:06:10 +02:00
Nicolas Patry
32f6416358
Upgrade resolution system for less errors in resolution.
2024-08-27 20:06:10 +02:00
Nicolas Patry
0bf4eb9683
Updated flake lock
2024-08-27 20:06:09 +02:00
Nicolas Patry
ffb6841121
Update lock
2024-08-27 20:05:29 +02:00
Nicolas Patry
cba59aca03
Disabling flashinfer/prefix caching on odd head_dim
2024-08-27 20:05:29 +02:00
Daniël de Kok
358ceb67dd
nix: add awq-inference-engine as server dependency ( #2442 )
2024-08-21 22:20:03 +02:00
Nicolas Patry
310778e02a
Adding eetq to flake. ( #2438 )
2024-08-21 09:06:33 +02:00
Daniël de Kok
f5f11b797e
nix: add pure server to flake, add both pure and impure devshells ( #2430 )
...
* nix: pure server and support both pure and impure devShells
* nix: remove unused poetry2nix input
It is not wired up and we now have a pure server.
* nix: add ipdb to impure devshell
2024-08-20 22:07:33 +02:00
Nicolas Patry
b70ae0969f
Prefix caching ( #2402 )
...
* Prefix caching WIP
* Fixing prefix attention.
* Fixing flashinfer import.
* Fixing black.
* Fixing medusa (still wrong outputs, but functional).
* Just medusa values now.
* Fixing medusa without prefix caching.
* Fixing prefix caching.
* Medusa requires reshaping.
* Removing the logs.
* Remove router.nix
* Fixup:
- Remove logs
- Disable VLMs (they do not work)
- Disable prefix caching when user wants prefill logprobs.
* Update flake.lock
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-08-20 11:15:30 +02:00
Daniël de Kok
38773453ae
nix: update to CUDA 12.4 ( #2429 )
...
* Update to CUDA 12.4
* poetry2nix: follow tgi-nix nixpkgs
2024-08-19 09:28:38 +02:00
Daniël de Kok
1411bfb989
nix: try to reduce the number of Rust rebuilds ( #2424 )
...
Try to reduce the number of router/launcher rebuilds by filtering
sources. In this way, recompiles should only be triggered by changes
in Cargo or Rust files.
2024-08-16 10:01:01 +02:00
Daniël de Kok
9aaa12e7ac
nix: build router incrementally ( #2422 )
2024-08-15 10:21:51 +02:00
Daniël de Kok
c5fff92b48
nix: partial incremental build of the router ( #2416 )
...
This is less incremental than crate2nix, but does build all dependencies
separately, so avoids full rebuilds.
2024-08-14 11:06:28 +02:00
Nicolas Patry
cd9b15d17f
Adding more kernels to flake. ( #2411 )
2024-08-13 10:49:18 +02:00
Daniël de Kok
6f4bb4f26f
nix: incremental build of the launcher ( #2410 )
2024-08-13 10:44:15 +02:00
Nicolas Patry
19ea85f8dc
Updating the flake. ( #2404 )
2024-08-12 18:09:16 +02:00
Daniël de Kok
8dcc7d3f6b
Update flake for 9.0a capability in Torch ( #2394 )
2024-08-09 22:36:51 +02:00
Daniël de Kok
6e127dcc96
flake: use rust-overlay ( #2390 )
2024-08-09 15:24:21 +02:00
Daniël de Kok
c6d5039cd7
Add experimental flake ( #2384 )
...
Add flake.nix
2024-08-09 12:32:37 +02:00