Daniël de Kok
e4ab855480
nix: improve impure devshell ( #2478 )
...
- Add some test dependencies.
- Install server in venv.
- Install Python client in venv.
2024-09-02 09:27:10 +02:00
Daniël de Kok
4e821c003a
nix: build Torch against MKL and various other improvements ( #2469 )
...
Updates tgi-nix input:
- Move Torch closer to upstream by building against MKL.
- Remove compute capability 8.7 from Torch (Jetson).
- Sync nixpkgs cumpute capabilities with Torch (avoids
compiling too mana capabilities for MAGMA).
- Use nixpkgs configuration passed through by `tgi-nix`.
2024-08-29 16:25:25 +02:00
Daniël de Kok
f3c5d7d92f
nix: add default package ( #2453 )
...
The default package wraps the launcher and puts the server/router in the
path.
As a result, TGI can be started using something like:
```
nix run .# -- \
--model-id hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 \
--port 8080
```
2024-08-23 22:06:22 +02:00
Daniël de Kok
9474415095
nix: add `text-generation-benchmark` to pure devshell ( #2431 )
...
nix: add text-generation-benchmark to pure devshell
2024-08-21 07:48:13 +02:00
Daniël de Kok
f5f11b797e
nix: add pure server to flake, add both pure and impure devshells ( #2430 )
...
* nix: pure server and support both pure and impure devShells
* nix: remove unused poetry2nix input
It is not wired up and we now have a pure server.
* nix: add ipdb to impure devshell
2024-08-20 22:07:33 +02:00
Nicolas Patry
b70ae0969f
Prefix caching ( #2402 )
...
* Prefix caching WIP
* Fixing prefix attention.
* Fixing flashinfer import.
* Fixing black.
* Fixing medusa (still wrong outputs, but functional).
* Just medusa values now.
* Fixing medusa without prefix caching.
* Fixing prefix caching.
* Medusa requires reshaping.
* Removing the logs.
* Remove router.nix
* Fixup:
- Remove logs
- Disable VLMs (they do not work)
- Disable prefix caching when user wants prefill logprobs.
* Update flake.lock
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-08-20 11:15:30 +02:00
Daniël de Kok
38773453ae
nix: update to CUDA 12.4 ( #2429 )
...
* Update to CUDA 12.4
* poetry2nix: follow tgi-nix nixpkgs
2024-08-19 09:28:38 +02:00
Daniël de Kok
1411bfb989
nix: try to reduce the number of Rust rebuilds ( #2424 )
...
Try to reduce the number of router/launcher rebuilds by filtering
sources. In this way, recompiles should only be triggered by changes
in Cargo or Rust files.
2024-08-16 10:01:01 +02:00
Daniël de Kok
9aaa12e7ac
nix: build router incrementally ( #2422 )
2024-08-15 10:21:51 +02:00
Nicolas Patry
f3b5c69441
Upgrading exl2. ( #2415 )
...
* Upgrading exl2.
* Fixing the other pathways.
* Fix idefics.
2024-08-14 11:58:08 +02:00
Daniël de Kok
c5fff92b48
nix: partial incremental build of the router ( #2416 )
...
This is less incremental than crate2nix, but does build all dependencies
separately, so avoids full rebuilds.
2024-08-14 11:06:28 +02:00
Nicolas Patry
cd9b15d17f
Adding more kernels to flake. ( #2411 )
2024-08-13 10:49:18 +02:00
Daniël de Kok
6f4bb4f26f
nix: incremental build of the launcher ( #2410 )
2024-08-13 10:44:15 +02:00
Nicolas Patry
19ea85f8dc
Updating the flake. ( #2404 )
2024-08-12 18:09:16 +02:00
Nicolas Patry
730fa00e20
Adding launcher to build. ( #2397 )
2024-08-12 14:08:46 +02:00
Daniël de Kok
01a515dea2
nix: add router to the devshell ( #2396 )
2024-08-12 09:28:38 +02:00
Daniël de Kok
6e127dcc96
flake: use rust-overlay ( #2390 )
2024-08-09 15:24:21 +02:00
Daniël de Kok
977534bcb8
flake: add fmt and clippy ( #2389 )
2024-08-09 14:56:20 +02:00
Daniël de Kok
c6d5039cd7
Add experimental flake ( #2384 )
...
Add flake.nix
2024-08-09 12:32:37 +02:00