hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
Morgan Funtowicz	e34289703b	misc(backend): let's try	2024-12-20 11:59:12 +01:00
Morgan Funtowicz	a8bfe6f21b	misc(backend): let's try	2024-12-20 11:56:02 +01:00
Morgan Funtowicz	407f522714	misc(backend): forward env	2024-12-20 09:57:12 +01:00
Morgan Funtowicz	a3399fcb98	misc(backend): forward env	2024-12-20 09:42:33 +01:00
Morgan Funtowicz	dd8dada772	misc(backend): add some tags	2024-12-19 11:34:52 +01:00
Morgan Funtowicz	59118548a0	misc(backend): let's add some more tooling	2024-12-19 11:28:23 +01:00
Morgan Funtowicz	a5e3e6ac24	misc(backend): let's build for ci-runtime	2024-12-19 09:42:43 +01:00
Morgan Funtowicz	6497964342	misc(backend): lets try...	2024-12-18 15:18:27 +01:00
Morgan Funtowicz	49261d6045	misc(backend): increase to 2h	2024-12-18 10:03:29 +01:00
Morgan Funtowicz	bf07b71080	misc(backend): lets try 1h30	2024-12-17 21:22:55 +01:00
Morgan Funtowicz	952cbdc2c1	misc(backend): lets try 1h30	2024-12-17 21:20:37 +01:00
Morgan Funtowicz	de13c8346d	misc(backend): add more info	2024-12-17 21:15:28 +01:00
Morgan Funtowicz	f5d577e4ad	misc(backend): use session token	2024-12-17 14:33:42 +01:00
Morgan Funtowicz	3e82af5953	misc(backend): kthxbye retry s3	2024-12-17 12:42:49 +01:00
Morgan Funtowicz	0938b7d3fd	misc(backend): WWWWWWWWWWWWWAAAAAAAA	2024-12-17 12:29:33 +01:00
Morgan Funtowicz	2b65669581	misc(backend): make sure to correctly set IS_GHA_BUILD=true in wf	2024-12-17 10:53:42 +01:00
Morgan Funtowicz	e5cc47a42e	misc(backend): missing env directive	2024-12-17 10:41:36 +01:00
Morgan Funtowicz	b303013227	misc(backend): let's try with GHA	2024-12-17 10:40:51 +01:00
Morgan Funtowicz	7dcee83a63	misc(backend): once more?	2024-12-16 16:03:07 +01:00
Morgan Funtowicz	5ded9cbd22	misc(backend): test with TGI S3 conf	2024-12-16 15:51:03 +01:00
Morgan Funtowicz	79f1b953dc	misc(backend): test with TGI S3 conf	2024-12-16 15:47:51 +01:00
Morgan Funtowicz	83e919f617	misc(ci): WAT	2024-12-12 16:42:38 +01:00
Morgan Funtowicz	e312a68469	misc(ci): WAT	2024-12-12 16:14:48 +01:00
Morgan Funtowicz	4aa060f99a	misc(ci): WAT	2024-12-12 16:08:01 +01:00
Morgan Funtowicz	6c62ded864	misc(ci): do not build with ssl enabled	2024-12-12 15:57:11 +01:00
Morgan Funtowicz	b31477cf63	misc(ci): lets actually use sccache ...	2024-12-12 15:38:44 +01:00
Morgan Funtowicz	68f5466c86	misc(ci): add debug profile	2024-12-12 14:52:12 +01:00
Morgan Funtowicz	f99049aafe	misc(ci): again	2024-12-12 14:24:33 +01:00
Morgan Funtowicz	e6abfdcb1f	misc(ci): let's try this way	2024-12-12 13:00:13 +01:00
Morgan Funtowicz	5f1b16f300	misc(ci): export aws creds as output of step	2024-12-12 12:48:58 +01:00
Morgan Funtowicz	5910dabb4e	misc(ci): provide mecanism to cache inside container	2024-12-12 12:45:14 +01:00
Morgan Funtowicz	e703c84578	misc(ci): let's try to build the Dockerfile for trtllm	2024-12-12 11:50:39 +01:00
Morgan Funtowicz	48a1a602e7	misc(ci): update Rust action toolchain	2024-12-12 09:40:07 +01:00
Morgan Funtowicz	de36c8e6dd	misc(ci): enabe building tensorrt-llm	2024-12-12 09:29:14 +01:00
Hugo Larcher	d5bc6a20bd	feat: Add automatic nightly benchmarks (#2591 ) * feat: Add automatic nightly benchmarks * fix: Update runners group * fix: add created_at field to results * fix: Add variable results file location	2024-11-21 17:11:42 +00:00
Daniël de Kok	07bed530f7	nix: build and cache impure devshells (#2765 ) * nix: build and cache all devshells * nix: add poetry to the impure shell This shouldn't be used to manage dependencies in a Nix devshell, but can be handy to update `poetry.lock`. * Fix Nix build, disable pure shell (covered by Nix tests)	2024-11-20 20:56:11 +01:00
Nicolas Patry	8a8794a672	Avoiding timeout for bloom tests. (#2693 ) * Avoiding timeout for bloom tests. * Skip the test let's see if it's always the first tests that fails. * Fail early. * Pulling ? * No early exit.	2024-10-26 05:35:28 +02:00
Nicolas Patry	3dbdf63ec5	Intel ci (#2630 ) * Intel CI ? * Let's try non sharded gemma. * Snapshot rename * Apparently container can be gone already.	2024-10-10 16:51:57 +02:00
Nicolas Patry	43f39f6894	AMD CI (#2589 ) * Only run 1 valid test. * TRying the tailscale action quickly. * ? * bash spaces. * Remove tailscale. * More quotes. * mnt2 ? * Othername to avoid recursive directories. * Good old tmate. * Remove tmate. * Trying a few things. * Remove some stuff. * Sleep ? * Tmp * busybox * Launcher tgi * Starting hello * Busybox in python * No device. * Removing all variables ? * A un moment donné. * Tmp * Tmp2 * DEvice request, no container name * No device requests * Without pytest. * No pytest. * from env * Start with devices * Attemp #1 * Remove stdin messing * Only 1 test, no container name * Raw tgi * Sending args. * Show pip freeze. * Start downloading with token * Giving HIP devices. * Mount volume + port forward * Without pytest. * No token * Repeated arguments * Wrong kwarg. * On 2 GPUs * Fallback to single shard CI test. * Testing * yaml * Common cache ? * Trailing slash ? * Docker volume split. * Fix docker volume * Fixing ? * ? * Try no devices ? * Flash llama on intel CPU ? * Fix nvidia ? * Temp deactivate intel, activate nvidia ?	2024-10-09 17:50:49 +02:00
Alvaro Bartolome	0aa66d693a	Fix build with `--features google` (#2566 ) * Fix `cargo build --features google` * Add `cargo test --features google`	2024-09-26 11:41:38 +02:00
Nicolas Patry	f512021e77	Stream options. (#2533 ) * Stream options. * Fetch stuff from nix integration test for easier testing. * Adding the assert. * Only send the usage when asked for. * Update the docs. * Impure test because we need network. * develop. * Optional usage. * Fixes. * Workflow	2024-09-19 20:50:37 +02:00
Daniël de Kok	ce85efa968	Move to moe-kernels package and switch to common MoE layer (#2511 ) * Move to moe-kernels package and switch to common MoE layer This change introduces the new `moe-kernels` package: - Add `moe-kernels` as a dependency. - Introduce a `SparseMoELayer` module that can be used by MoE models. - Port over Mixtral and Deepseek. * Make `cargo check` pass * Update runner	2024-09-17 18:08:58 +02:00
Daniël de Kok	71e4268600	nix: pure Rust check/fmt/clippy/test (#2525 ) Runs the tests in a Nix build sandbox.	2024-09-17 12:14:30 +02:00
Nicolas Patry	d95c670ada	Add nix test. (#2513 ) * Add nix test. * Modifying yourself means you need to rerun. * Fixing the test + adding click (needed for pre-commit hooks). * Try thuis. * Our runner + pure test (not written) * Reemove server. * Root user. * Different user ? * Add the actual test target. * Forgot this modification. * Add a formatter. * Add the secrets. * Fixed the auth token ? * Adding the other tests. * Missing pre-commit. * Test requires cargo for cargo fmt. * Update it a bit. * Up. * Attempting to use a cache location for the models. * Ignore the cache for now.	2024-09-12 14:54:56 +02:00
Nicolas Patry	dae3bf1d87	Fix tokenization yi (#2507 ) * Fixing odd tokenization self modifications on the Rust side (load and resave in Python). * Fixing the builds ? * Fix the gh action? * Fixing the location ? * Validation is odd. * Try a faster runner * Upgrade python version. * Remove sccache * No sccache. * Getting libpython maybe ? * List stuff. * Monkey it up. * have no idea at this point * Tmp. * Shot in the dark. * Tmate the hell out of this. * Desperation. * WTF. * -y. * Apparently 3.10 is not available anymore. * Updating the dockerfile to make libpython discoverable at runtime too. * Put back rust tests. * Why do we want mkl on AMD ? * Forcing 3.11 ?	2024-09-11 22:41:56 +02:00
Nicolas Patry	e415b690a6	Lots of improvements (Still 2 allocators) (#2449 ) * Making prefix/flashinfer the default and testing the full release tests. * Include flashinfer in the docker. * Using prebuilt. * Allowing window_left_size (dummy version). * Disabling flashinfer/prefix caching on odd head_dim * Disable prefix caching for lora. * More specific codes. * Update lock * Updating integration tests with new values with FI/FD. Remove paged as a default too, and using FD everywhere. * Update cargo lock ? * Upgrade to 1.80 because of bitstream... * Everywhere 1.80 * Forgot last default place. * Apply suggestions from code review Co-authored-by: drbh <david.richard.holtz@gmail.com> * Updated flake lock * Tmp * Upgrade resolution system for less errors in resolution. * Remove lambda for cleaner function. * Handling debugger. * OVerride the env in server tests. * Is this enough to make it work ? * This seems to be working. * Downgrade some logs. * Fixing the default for vlm. * Don't enable prefix caching on VLM just yet. * Change `add_special_tokens` in order to have the correct tokens for chat input and not (since it's super important with the prefixing now) * Fixing prefix caching for flashdecoding. * Update all models. * Fixed flashinfer version. * add_special_tokens is internal only * Fixing seqlen with the new vlms. * Fixing the issue with `add_special_tokens` not being passed around. * Fixing the test. * Removing encoder_decoder (seq2seq). * Update the chat test. * Fixing the batching tokenization in flash causal lm. * Truncating left for radix purposes. * Oops this doesn't belong here. * Put back default pure shell. * Update server tests - Default to throughput test in k6 - Use TGI_WIGGLE_ROOM to adjust wiggle room * Only n_heads / process_group.size() are necessary. * Revert the integrationt tests change (seem linked to head_size modification). * Adding error message when assert is violated. * Fixing the free algorithm to handle times where the common prefix is smaller. * Apply suggestions from code review Co-authored-by: OlivierDehaene <olivier@huggingface.co> * Update server/text_generation_server/layers/attention/common.py Co-authored-by: OlivierDehaene <olivier@huggingface.co> * Fix disabling prefix caching - Fix windowing checks. * Revert the Cohere tokenizer change (for now using a revision instead). * Fmt. --------- Co-authored-by: drbh <david.richard.holtz@gmail.com> Co-authored-by: OlivierDehaene <olivier@huggingface.co>	2024-08-29 16:29:01 +02:00
Nicolas Patry	2788d41a76	Fixing CI. (#2462 )	2024-08-27 15:33:02 +02:00
Nicolas Patry	e4201f44cf	All integration tests back everywhere (too many failed CI). (#2428 ) * All integration tests back everywhere (too many failed CI). * Upgrade integration tests after 12.4 * Attempt to remove the specifed compute cap. * Common arch list. * Punica uses raw ASM which is not valid on 9.0 apparently.	2024-08-16 21:19:46 +02:00
Hugo Larcher	53729b74ac	doc: Add metrics documentation and add a 'Reference' section (#2230 ) * doc: Add metrics documentation and add a 'Reference' section * doc: Add API reference * doc: Refactor API reference * fix: Message API link * Bad rebase * Moving the docs. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-08-16 19:43:30 +02:00
Wang, Yi	b6bb1d5160	Cpu dockerimage (#2367 ) add intel-cpu docker image Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-08-12 14:10:30 +02:00

1 2 3

143 Commits