hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
Jeff	5b2155b0f8	corrected Pydantic warning. (#2095 ) * corrected Pydantic warning. * Update clients/python/text_generation/types.py Co-authored-by: Daniël de Kok <me@github.danieldk.eu> --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Daniël de Kok <me@github.danieldk.eu>	2024-06-25 10:10:32 +02:00
KevinDuffy94	1869ee2f57	Add OTLP Service Name Environment Variable (#2076 ) * Adding Service Name Environment variable for https://github.com/huggingface/text-generation-inference/issues/2069 * Update Docs * Update README.md * Update Launcher Docs * Update Launcher Docs Removing Option	2024-06-25 09:33:01 +02:00
Lucain	3447c722fd	Support `HF_TOKEN` environment variable (#2066 ) * Support HF_TOKEN environement variable * Load test. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-06-25 09:23:12 +02:00
Felix Marty	09a41f2c43	do not skip workflow on cuda, fix no space left no device	2024-06-24 18:51:59 +02:00
Felix Marty	f16f0ad92b	do not login to internal registry	2024-06-24 18:51:58 +02:00
Felix Marty	13bbf6cc5c	does ci pass without tailscale?	2024-06-24 18:51:33 +02:00
Felix Marty	ee62872d66	test tailscale independently	2024-06-24 18:51:33 +02:00
Felix Marty	1bb1a344d7	retry	2024-06-24 18:51:33 +02:00
Felix Marty	bc2b9b20e2	trigger ci	2024-06-24 18:51:32 +02:00
Felix Marty	3464d60d4b	The handshake operation timed out & hanging	2024-06-24 18:51:32 +02:00
Felix Marty	284894303a	remove require_backend decorators on handles, for some reasons fails in github actions	2024-06-24 18:51:32 +02:00
Felix Marty	7e0f4f25c7	renamed file	2024-06-24 18:51:32 +02:00
Felix Marty	393234de9b	hopefully fix ci	2024-06-24 18:51:32 +02:00
Felix Marty	67999773f3	fix workflow	2024-06-24 18:51:32 +02:00
Felix Marty	5fb8c275c3	fix style & typo	2024-06-24 18:51:30 +02:00
Felix Marty	e62ac4d63a	trigger	2024-06-24 18:51:09 +02:00
fxmarty	df7bb11793	dial tcp: lookup registry-1.docker.io: i/o timeout	2024-06-24 18:51:08 +02:00
fxmarty	40b342a12e	fix space	2024-06-24 18:51:08 +02:00
fxmarty	3de8f3647b	fix decorators	2024-06-24 18:51:08 +02:00
fxmarty	4616c62914	style	2024-06-24 18:51:08 +02:00
Felix Marty	5b6b257756	fix gpt2 tests - some weights were not contiguous	2024-06-24 18:51:08 +02:00
Felix Marty	9e50c117bc	fix idefics2 tests	2024-06-24 18:51:06 +02:00
fxmarty	1846c1c210	fix tests	2024-06-24 18:50:18 +02:00
fxmarty	1e10597d0c	update	2024-06-24 18:50:17 +02:00
fxmarty	406885638b	skip exl2 tests on rocm	2024-06-24 18:49:45 +02:00
fxmarty	5a4b798f98	fix gptq tests, LLMM1 matrix bound	2024-06-24 18:49:45 +02:00
fxmarty	49db30a137	disable marlin tests on rocm/xpu	2024-06-24 18:49:37 +02:00
ur4t	405765b18c	Fix cargo-chef prepare (#2101 ) * Fix cargo-chef prepare In prepare stage, cargo-chef reads Cargo.lock and transforms it accordingly. If Cargo.lock is not present, cargo-chef will generate a new one first, which might vary a lot and invalidate docker build caches. * Fix Dockerfile_amd and Dockerfile_intel	2024-06-24 18:16:36 +02:00
Nicolas Patry	480d3b3304	New runner. Manual squash. (#2110 ) * New runner. Manual squash. * Network host. * Put back trufflehog with proper extension. * No network host ? * Moving buildx install after tailscale ? * 1.79	2024-06-24 18:08:34 +02:00
drbh	811a9381b1	feat: sort cuda graphs in descending order (#2104 )	2024-06-21 14:28:26 -04:00
Daniël de Kok	197c47a302	Fix `text-generation-server quantize` (#2103 ) The subcommand did not work due to some broken imports.	2024-06-21 15:28:51 +02:00
Daniël de Kok	bcb3faa1c2	Factor out sharding of packed tensors (#2059 ) For Phi-3-Small I need to shard a packed QKV bias tensor, for which I implemented the `Weights.get_packed_sharded` method. However, this method can also replace the `Weights._get_qweight` method and the custom sharding code from `Weights.get_weights_col_packed`.	2024-06-20 09:56:04 +02:00
Daniël de Kok	f5a9837592	Support exl2-quantized Qwen2 models (#2085 ) Fixes #2081.	2024-06-20 07:56:16 +02:00
drbh	cdbf802860	feat: rotate tests ci token (#2091 )	2024-06-19 17:02:58 -04:00
Daniël de Kok	11ea9ce002	CI: pass pre-commit hooks again (#2084 )	2024-06-18 09:38:21 +02:00
Guillaume LEGENDRE	4f25c67d63	CI: Tailscale improvements (#2079 ) * test local tailscale * Update build.yaml * Update build.yaml * Update build.yaml * Update build.yaml * wait for ssh * network host * change step order	2024-06-18 09:13:04 +02:00
Daniël de Kok	c8c7ccd31e	Set maximum grpc message receive size to 2GiB (#2075 ) * Set maximum grpc message receive size to 2GiB The previous default was 4MiB, which doesn't really work well for multi-modal models. * Update to Rust 1.79.0 * Fixup formatting to make PR pass	2024-06-17 16:40:44 +02:00
Ziru Niu	0f7d38e774	fix build.rs watch files (#2072 )	2024-06-17 12:10:01 +02:00
Lysandre Debut	131838919e	Contributing guide & Code of Conduct (#2074 ) * Contributing guide & Code of Conduct * Redirect to GitHub's tutorial on PRs	2024-06-17 12:09:31 +02:00
Daniël de Kok	e903770897	Support different image sizes in prefill in VLMs (#2065 ) When a batch contained images if different sizes during prefill, the server would fail (see e.g. #2056). Images were processed separately and then concatenated. However, this can fail for images with different sizes. Fix this by preprocessing all images in the batch together, so that the image processor can ensure that all image tensors have compatible sizes.	2024-06-17 10:49:41 +02:00
Alvaro Moran	445f313504	Adding architecture document (#2044 ) * doc: adding architecture document * doc: add architecture to toctree * fix: avoid cargo lock changes * fix: avoid cargo lock tweak --------- Co-authored-by: drbh <david.richard.holtz@gmail.com>	2024-06-14 09:28:34 -04:00
Tiezhen WANG	96b7b40ca3	Update the link for qwen2 (#2068 ) * Update the link for qwen2 * Fix Qwen2 model URL in model table * Fix too eager staging --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2024-06-14 11:59:33 +02:00
Daniël de Kok	093a27c528	Add support for GPTQ Marlin (#2052 ) Add support for GPTQ Marlin kernels GPTQ Marlin extends the Marlin kernels to support common GPTQ configurations: - bits: 4 or 8 - groupsize: -1, 32, 64, or 128 - desc_act: true/false Using the GPTQ Marlin kernels requires repacking the parameters in the Marlin quantizer format. The kernels were contributed by Neural Magic to VLLM. We vendor them here for convenience.	2024-06-14 09:45:42 +02:00
drbh	f433f1f770	implement Open Inference Protocol endpoints (#1942 ) * feat: add kserve feature and basic routes * feat: implement infer endpoint wrapper around generate * fix: refactor and improve types * fix: improve infer and simplify * fix: cleanup and improve api docs * fix: refactor and encapsulate kserve feat in file * fix: remove typos after rebase	2024-06-13 12:51:51 -04:00
drbh	42aa8ee1bb	PR #2049 CI run (#2054 ) * Use minijinja's pycompat mode for python methods * fix: cargo fmt lint for pre commit --------- Co-authored-by: Armin Ronacher <armin.ronacher@active-4.com>	2024-06-13 11:53:49 -04:00
OlivierDehaene	90184df79c	fix(layers): fix SuRotaryEmbedding (#2060 ) * fix(layers): fix SuRotaryEmbedding * change arange * remove logs	2024-06-12 18:24:47 +02:00
OlivierDehaene	521de6cacd	fix(server): fix OPT implementation (#2061 )	2024-06-12 18:22:20 +02:00
drbh	376a0b7ada	Support chat response format (#2046 ) * feat: support response_format in chat * fix: adjust typos * fix: add trufflehog lint	2024-06-11 10:44:56 -04:00
fxmarty	a6e4d63c86	Update LLMM1 bound (#2050 ) update commit	2024-06-11 19:30:29 +08:00
Luc Georges	dfca1dfc5e	fix(ci): remove unnecessary permissions (#2045 )	2024-06-10 12:16:53 -04:00

1 2 3 4 5 ...

911 Commits All Branches Search

911 Commits

All Branches