Commit Graph

19 Commits

Author SHA1 Message Date
Nicolas Patry 682db34b6a
Handling debugger. 2024-08-27 20:06:10 +02:00
Nicolas Patry 32f6416358
Upgrade resolution system for less errors in resolution. 2024-08-27 20:06:10 +02:00
Nicolas Patry 0bf4eb9683
Updated flake lock 2024-08-27 20:06:09 +02:00
Nicolas Patry ffb6841121
Update lock 2024-08-27 20:05:29 +02:00
Nicolas Patry cba59aca03
Disabling flashinfer/prefix caching on odd head_dim 2024-08-27 20:05:29 +02:00
Daniël de Kok 358ceb67dd
nix: add awq-inference-engine as server dependency (#2442) 2024-08-21 22:20:03 +02:00
Nicolas Patry 310778e02a
Adding eetq to flake. (#2438) 2024-08-21 09:06:33 +02:00
Daniël de Kok f5f11b797e
nix: add pure server to flake, add both pure and impure devshells (#2430)
* nix: pure server and support both pure and impure devShells

* nix: remove unused poetry2nix input

It is not wired up and we now have a pure server.

* nix: add ipdb to impure devshell
2024-08-20 22:07:33 +02:00
Nicolas Patry b70ae0969f
Prefix caching (#2402)
* Prefix caching WIP

* Fixing prefix attention.

* Fixing flashinfer import.

* Fixing black.

* Fixing medusa (still wrong outputs, but functional).

* Just medusa values now.

* Fixing medusa without prefix caching.

* Fixing prefix caching.

* Medusa requires reshaping.

* Removing the logs.

* Remove router.nix

* Fixup:

- Remove logs
- Disable VLMs (they do not work)
- Disable prefix caching when user wants prefill logprobs.

* Update flake.lock

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-08-20 11:15:30 +02:00
Daniël de Kok 38773453ae
nix: update to CUDA 12.4 (#2429)
* Update to CUDA 12.4

* poetry2nix: follow tgi-nix nixpkgs
2024-08-19 09:28:38 +02:00
Daniël de Kok 1411bfb989
nix: try to reduce the number of Rust rebuilds (#2424)
Try to reduce the number of router/launcher rebuilds by filtering
sources. In this way, recompiles should only be triggered by changes
in Cargo or Rust files.
2024-08-16 10:01:01 +02:00
Daniël de Kok 9aaa12e7ac
nix: build router incrementally (#2422) 2024-08-15 10:21:51 +02:00
Daniël de Kok c5fff92b48
nix: partial incremental build of the router (#2416)
This is less incremental than crate2nix, but does build all dependencies
separately, so avoids full rebuilds.
2024-08-14 11:06:28 +02:00
Nicolas Patry cd9b15d17f
Adding more kernels to flake. (#2411) 2024-08-13 10:49:18 +02:00
Daniël de Kok 6f4bb4f26f
nix: incremental build of the launcher (#2410) 2024-08-13 10:44:15 +02:00
Nicolas Patry 19ea85f8dc
Updating the flake. (#2404) 2024-08-12 18:09:16 +02:00
Daniël de Kok 8dcc7d3f6b
Update flake for 9.0a capability in Torch (#2394) 2024-08-09 22:36:51 +02:00
Daniël de Kok 6e127dcc96
flake: use rust-overlay (#2390) 2024-08-09 15:24:21 +02:00
Daniël de Kok c6d5039cd7
Add experimental flake (#2384)
Add flake.nix
2024-08-09 12:32:37 +02:00