Commit Graph

143 Commits

Author SHA1 Message Date
Morgan Funtowicz e34289703b misc(backend): let's try 2024-12-20 11:59:12 +01:00
Morgan Funtowicz a8bfe6f21b misc(backend): let's try 2024-12-20 11:56:02 +01:00
Morgan Funtowicz 407f522714 misc(backend): forward env 2024-12-20 09:57:12 +01:00
Morgan Funtowicz a3399fcb98 misc(backend): forward env 2024-12-20 09:42:33 +01:00
Morgan Funtowicz dd8dada772 misc(backend): add some tags 2024-12-19 11:34:52 +01:00
Morgan Funtowicz 59118548a0 misc(backend): let's add some more tooling 2024-12-19 11:28:23 +01:00
Morgan Funtowicz a5e3e6ac24 misc(backend): let's build for ci-runtime 2024-12-19 09:42:43 +01:00
Morgan Funtowicz 6497964342 misc(backend): lets try... 2024-12-18 15:18:27 +01:00
Morgan Funtowicz 49261d6045 misc(backend): increase to 2h 2024-12-18 10:03:29 +01:00
Morgan Funtowicz bf07b71080 misc(backend): lets try 1h30 2024-12-17 21:22:55 +01:00
Morgan Funtowicz 952cbdc2c1 misc(backend): lets try 1h30 2024-12-17 21:20:37 +01:00
Morgan Funtowicz de13c8346d misc(backend): add more info 2024-12-17 21:15:28 +01:00
Morgan Funtowicz f5d577e4ad misc(backend): use session token 2024-12-17 14:33:42 +01:00
Morgan Funtowicz 3e82af5953 misc(backend): kthxbye retry s3 2024-12-17 12:42:49 +01:00
Morgan Funtowicz 0938b7d3fd misc(backend): WWWWWWWWWWWWWAAAAAAAA 2024-12-17 12:29:33 +01:00
Morgan Funtowicz 2b65669581 misc(backend): make sure to correctly set IS_GHA_BUILD=true in wf 2024-12-17 10:53:42 +01:00
Morgan Funtowicz e5cc47a42e misc(backend): missing env directive 2024-12-17 10:41:36 +01:00
Morgan Funtowicz b303013227 misc(backend): let's try with GHA 2024-12-17 10:40:51 +01:00
Morgan Funtowicz 7dcee83a63 misc(backend): once more? 2024-12-16 16:03:07 +01:00
Morgan Funtowicz 5ded9cbd22 misc(backend): test with TGI S3 conf 2024-12-16 15:51:03 +01:00
Morgan Funtowicz 79f1b953dc misc(backend): test with TGI S3 conf 2024-12-16 15:47:51 +01:00
Morgan Funtowicz 83e919f617 misc(ci): WAT 2024-12-12 16:42:38 +01:00
Morgan Funtowicz e312a68469 misc(ci): WAT 2024-12-12 16:14:48 +01:00
Morgan Funtowicz 4aa060f99a misc(ci): WAT 2024-12-12 16:08:01 +01:00
Morgan Funtowicz 6c62ded864 misc(ci): do not build with ssl enabled 2024-12-12 15:57:11 +01:00
Morgan Funtowicz b31477cf63 misc(ci): lets actually use sccache ... 2024-12-12 15:38:44 +01:00
Morgan Funtowicz 68f5466c86 misc(ci): add debug profile 2024-12-12 14:52:12 +01:00
Morgan Funtowicz f99049aafe misc(ci): again 2024-12-12 14:24:33 +01:00
Morgan Funtowicz e6abfdcb1f misc(ci): let's try this way 2024-12-12 13:00:13 +01:00
Morgan Funtowicz 5f1b16f300 misc(ci): export aws creds as output of step 2024-12-12 12:48:58 +01:00
Morgan Funtowicz 5910dabb4e misc(ci): provide mecanism to cache inside container 2024-12-12 12:45:14 +01:00
Morgan Funtowicz e703c84578 misc(ci): let's try to build the Dockerfile for trtllm 2024-12-12 11:50:39 +01:00
Morgan Funtowicz 48a1a602e7 misc(ci): update Rust action toolchain 2024-12-12 09:40:07 +01:00
Morgan Funtowicz de36c8e6dd misc(ci): enabe building tensorrt-llm 2024-12-12 09:29:14 +01:00
Hugo Larcher d5bc6a20bd
feat: Add automatic nightly benchmarks (#2591)
* feat: Add automatic nightly benchmarks

* fix: Update runners group

* fix: add created_at field to results

* fix: Add variable results file location
2024-11-21 17:11:42 +00:00
Daniël de Kok 07bed530f7
nix: build and cache impure devshells (#2765)
* nix: build and cache all devshells

* nix: add poetry to the impure shell

This shouldn't be used to manage dependencies in a Nix devshell, but can
be handy to update `poetry.lock`.

* Fix Nix build, disable pure shell (covered by Nix tests)
2024-11-20 20:56:11 +01:00
Nicolas Patry 8a8794a672
Avoiding timeout for bloom tests. (#2693)
* Avoiding timeout for bloom tests.

* Skip the test let's see if it's always the first tests that fails.

* Fail early.

* Pulling ?

* No early exit.
2024-10-26 05:35:28 +02:00
Nicolas Patry 3dbdf63ec5
Intel ci (#2630)
* Intel CI ?

* Let's try non sharded gemma.

* Snapshot rename

* Apparently container can be gone already.
2024-10-10 16:51:57 +02:00
Nicolas Patry 43f39f6894
AMD CI (#2589)
* Only run 1 valid test.

* TRying the tailscale action quickly.

* ?

* bash spaces.

* Remove tailscale.

* More quotes.

* mnt2 ?

* Othername to avoid recursive directories.

* Good old tmate.

* Remove tmate.

* Trying a few things.

* Remove some stuff.

* Sleep ?

* Tmp

* busybox

* Launcher tgi

* Starting hello

* Busybox in python

* No device.

* Removing all variables ?

* A un moment donné.

* Tmp

* Tmp2

* DEvice request, no container name

* No device requests

* Without pytest.

* No pytest.

* from env

* Start with devices

* Attemp #1

* Remove stdin messing

* Only 1 test, no container name

* Raw tgi

* Sending args.

* Show pip freeze.

* Start downloading with token

* Giving HIP devices.

* Mount volume + port forward

* Without pytest.

* No token

* Repeated arguments

* Wrong kwarg.

* On 2 GPUs

* Fallback to single shard CI test.

* Testing

* yaml

* Common cache ?

* Trailing slash ?

* Docker volume split.

* Fix docker volume

* Fixing ?

* ?

* Try no devices ?

* Flash llama on intel CPU ?

* Fix nvidia ?

* Temp deactivate intel, activate nvidia ?
2024-10-09 17:50:49 +02:00
Alvaro Bartolome 0aa66d693a
Fix build with `--features google` (#2566)
* Fix `cargo build --features google`

* Add `cargo test --features google`
2024-09-26 11:41:38 +02:00
Nicolas Patry f512021e77
Stream options. (#2533)
* Stream options.

* Fetch stuff from nix integration test for easier testing.

* Adding the assert.

* Only send the usage when asked for.

* Update the docs.

* Impure test because we need network.

* develop.

* Optional usage.

* Fixes.

* Workflow
2024-09-19 20:50:37 +02:00
Daniël de Kok ce85efa968
Move to moe-kernels package and switch to common MoE layer (#2511)
* Move to moe-kernels package and switch to common MoE layer

This change introduces the new `moe-kernels` package:

- Add `moe-kernels` as a dependency.
- Introduce a `SparseMoELayer` module that can be used by MoE
  models.
- Port over Mixtral and Deepseek.

* Make `cargo check` pass

* Update runner
2024-09-17 18:08:58 +02:00
Daniël de Kok 71e4268600
nix: pure Rust check/fmt/clippy/test (#2525)
Runs the tests in a Nix build sandbox.
2024-09-17 12:14:30 +02:00
Nicolas Patry d95c670ada
Add nix test. (#2513)
* Add nix test.

* Modifying yourself means you need to rerun.

* Fixing the test + adding click (needed for pre-commit hooks).

* Try thuis.

* Our runner + pure test (not written)

* Reemove server.

* Root user.

* Different user ?

* Add the actual test target.

* Forgot this modification.

* Add a formatter.

* Add the secrets.

* Fixed the auth token ?

* Adding the other tests.

* Missing pre-commit.

* Test requires cargo for cargo fmt.

* Update it a bit.

* Up.

* Attempting to use a cache location for the models.

* Ignore the cache for now.
2024-09-12 14:54:56 +02:00
Nicolas Patry dae3bf1d87
Fix tokenization yi (#2507)
* Fixing odd tokenization self modifications on the Rust side (load and
resave in Python).

* Fixing the builds ?

* Fix the gh action?

* Fixing the location ?

* Validation is odd.

* Try a faster runner

* Upgrade python version.

* Remove sccache

* No sccache.

* Getting libpython maybe ?

* List stuff.

* Monkey it up.

* have no idea at this point

* Tmp.

* Shot in the dark.

* Tmate the hell out of this.

* Desperation.

* WTF.

* -y.

* Apparently 3.10 is not available anymore.

* Updating the dockerfile to make libpython discoverable at runtime too.

* Put back rust tests.

* Why do we want mkl on AMD ?

* Forcing 3.11 ?
2024-09-11 22:41:56 +02:00
Nicolas Patry e415b690a6
Lots of improvements (Still 2 allocators) (#2449)
* Making prefix/flashinfer the default and testing the full release tests.

* Include flashinfer in the docker.

* Using prebuilt.

* Allowing window_left_size (dummy version).

* Disabling flashinfer/prefix caching on odd head_dim

* Disable prefix caching for lora.

* More specific codes.

* Update lock

* Updating integration tests with new values with FI/FD.

Remove paged as a default too, and using FD everywhere.

* Update cargo lock ?

* Upgrade to 1.80 because of bitstream...

* Everywhere 1.80

* Forgot last default place.

* Apply suggestions from code review

Co-authored-by: drbh <david.richard.holtz@gmail.com>

* Updated flake lock

* Tmp

* Upgrade resolution system for less errors in resolution.

* Remove lambda for cleaner function.

* Handling debugger.

* OVerride the env in server tests.

* Is this enough to make it work ?

* This seems to be working.

* Downgrade some logs.

* Fixing the default for vlm.

* Don't enable prefix caching on VLM just yet.

* Change `add_special_tokens` in order to have the correct tokens for chat
input and not (since it's super important with the prefixing now)

* Fixing prefix caching for flashdecoding.

* Update all models.

* Fixed flashinfer version.

* add_special_tokens is internal only

* Fixing seqlen with the new vlms.

* Fixing the issue with `add_special_tokens` not being passed around.

* Fixing the test.

* Removing encoder_decoder (seq2seq).

* Update the chat test.

* Fixing the batching tokenization in flash causal lm.

* Truncating left for radix purposes.

* Oops this doesn't belong here.

* Put back default pure shell.

* Update server tests

- Default to throughput test in k6
- Use TGI_WIGGLE_ROOM to adjust wiggle room

* Only n_heads / process_group.size() are necessary.

* Revert the integrationt tests change (seem linked to head_size
modification).

* Adding error message when assert is violated.

* Fixing the free algorithm to handle times where the common prefix is
smaller.

* Apply suggestions from code review

Co-authored-by: OlivierDehaene <olivier@huggingface.co>

* Update server/text_generation_server/layers/attention/common.py

Co-authored-by: OlivierDehaene <olivier@huggingface.co>

* Fix disabling prefix caching - Fix windowing checks.

* Revert the Cohere tokenizer change (for now using a revision instead).

* Fmt.

---------

Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2024-08-29 16:29:01 +02:00
Nicolas Patry 2788d41a76
Fixing CI. (#2462) 2024-08-27 15:33:02 +02:00
Nicolas Patry e4201f44cf
All integration tests back everywhere (too many failed CI). (#2428)
* All integration tests back everywhere (too many failed CI).

* Upgrade integration tests after 12.4

* Attempt to remove the specifed compute cap.

* Common arch list.

* Punica uses raw ASM which is not valid on 9.0 apparently.
2024-08-16 21:19:46 +02:00
Hugo Larcher 53729b74ac
doc: Add metrics documentation and add a 'Reference' section (#2230)
* doc: Add metrics documentation and add a 'Reference' section

* doc: Add API reference

* doc: Refactor API reference

* fix: Message API link

* Bad rebase

* Moving the docs.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-08-16 19:43:30 +02:00
Wang, Yi b6bb1d5160
Cpu dockerimage (#2367)
add intel-cpu docker image

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-08-12 14:10:30 +02:00