hf_text-generation-inference

Commit Graph

Author	SHA1	Message	Date
fxmarty	291453fe88	Merge branch 'main' into ci_amd3	2024-07-16 15:15:17 +02:00
Nicolas Patry	4c976fb406	Updating the self check (#2209 ) * Updating the self check * Fix. * Revert the CLI . * cli. * Space. * Revert cargo update.	2024-07-09 17:23:48 +02:00
Nicolas Patry	fe710af25f	Adding sanity check to openapi docs.	2024-07-09 11:13:48 +02:00
Guillaume LEGENDRE	5e2a305880	Fix buildx cache + change runner type (#2176 ) * Update build.yaml * Update build.yaml * change to S3 cache * change to CPU Runners * remove comments	2024-07-08 18:13:32 +02:00
fxmarty	d7c6061387	missing lib	2024-07-08 14:28:50 +02:00
fxmarty	4e3f687427	use base docker image	2024-07-08 13:10:09 +02:00
fxmarty	8c590be463	Merge branch 'main' into ci_amd3	2024-07-08 13:06:39 +02:00
Daniël de Kok	05c094fcfa	Consistently take `prefix` in model constructors (#2191 ) * Consistently take `prefix` in model constructors * Release test check fix * Misc refactor-related fixes	2024-07-05 16:07:48 +02:00
Daniël de Kok	67ef0649cf	GPTQ CI improvements (#2151 ) * Add more representative Llama GPTQ test The Llama GPTQ test is updated to use a model with the commonly-used quantizer config format and activation sorting. The old test is kept around (but renamed) since it tests the format produced by `text-generation-server quantize`. * Add support for manually triggering a release build	2024-07-05 14:12:16 +02:00
drbh	571530dd9a	feat: improve update_docs for openapi schema (#2169 ) * feat: add pre commit step to force schema update when router changes * fix: prefer improved update_doc and start server and compare * fix: adjust typo * fix: adjust revert typo * fix: update workflow to use update_doc md command * feat: improve workflow to check openapi schema too * fix: adjust timeout for CI * fix: adjust raise condition and install server in ci * fix: install protoc before server * feat: improve update doc and add command to print router schema * fix: adjust autodoc workflow * fix: explicitly install protoc and python * fix: alllow trailing space in openapi schema diff	2024-07-03 09:53:35 +02:00
fxmarty	29a416078c	Merge branch 'main' into ci_amd3	2024-07-02 15:32:53 +02:00
Felix Marty	add4d42cb3	do not use tunableop for non flash-causal-lm modezls	2024-07-02 12:52:55 +00:00
Guillaume LEGENDRE	963b6c6f0f	Ci test (#2124 ) * first test with registry mirror * change push registry * remove comments * Move cache to push registry * fix registry url * Update .github/workflows/ci_build.yaml --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-07-02 12:45:38 +02:00
Nicolas Patry	d0225b1015	GH router. (#2153 )	2024-07-01 15:42:26 +02:00
fxmarty	59849777de	Merge branch 'main' into ci_amd3	2024-07-01 14:14:46 +02:00
Felix Marty	05d1011b4f	fix xpu build	2024-06-28 16:08:27 +00:00
Felix Marty	3d50ff71b7	bump torch to more recent version	2024-06-28 13:10:43 +00:00
Felix Marty	87db820627	fix rm	2024-06-28 09:49:20 +00:00
Nicolas Patry	fb98ab273f	Fixing the CI to also run in release when it's a tag ? (#2138 )	2024-06-28 09:31:09 +02:00
Felix Marty	eaa6890b3c	remove hidden	2024-06-27 15:24:14 +00:00
Felix Marty	0a5485d8a0	avoid permissions issues	2024-06-27 14:51:11 +00:00
Felix Marty	60a96a9ae3	do not use private registry in cleanup cache step	2024-06-26 13:57:05 +00:00
Felix Marty	4067fc8211	login to registry	2024-06-26 10:58:52 +00:00
Felix Marty	2330052aa2	debug	2024-06-26 10:43:57 +00:00
fxmarty	227f78f3fe	Merge branch 'main' into ci_amd3	2024-06-26 12:08:42 +02:00
Daniël de Kok	fc9c3153e5	Add pytest release marker (#2114 ) * Add pytest release marker Annotate a test with `@pytest.mark.release` and it only gets run with `pytest integration-tests --release`. * Mark many models as `release` to speed up CI	2024-06-25 16:53:20 +02:00
Nicolas Patry	9e2fdf57c0	Removing IPEX_AVAIL. (#2115 ) * Removing IPEX_AVAIL. Chose to unify CPU and XPU under `ipex`. Most code is exactly similar except for a very few spots. The biggest number of spots is the kv-cache layout and the flash_xxx.py files. Since those files should be removed soon and factored away, we should not need them. * Forgot a few places. * Unrelated change. * Fixing HF_TOKEN. * HF_TOKEN	2024-06-25 13:20:57 +02:00
Felix Marty	04298e5799	add back credentials	2024-06-25 09:22:49 +00:00
fxmarty	dc53846456	Merge branch 'main' into ci_amd3	2024-06-25 11:20:00 +02:00
Lucain	3447c722fd	Support `HF_TOKEN` environment variable (#2066 ) * Support HF_TOKEN environement variable * Load test. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-06-25 09:23:12 +02:00
Felix Marty	09a41f2c43	do not skip workflow on cuda, fix no space left no device	2024-06-24 18:51:59 +02:00
Felix Marty	f16f0ad92b	do not login to internal registry	2024-06-24 18:51:58 +02:00
Felix Marty	13bbf6cc5c	does ci pass without tailscale?	2024-06-24 18:51:33 +02:00
Felix Marty	ee62872d66	test tailscale independently	2024-06-24 18:51:33 +02:00
Felix Marty	284894303a	remove require_backend decorators on handles, for some reasons fails in github actions	2024-06-24 18:51:32 +02:00
Felix Marty	393234de9b	hopefully fix ci	2024-06-24 18:51:32 +02:00
Felix Marty	67999773f3	fix workflow	2024-06-24 18:51:32 +02:00
Felix Marty	5fb8c275c3	fix style & typo	2024-06-24 18:51:30 +02:00
fxmarty	40b342a12e	fix space	2024-06-24 18:51:08 +02:00
fxmarty	1e10597d0c	update	2024-06-24 18:50:17 +02:00
Nicolas Patry	480d3b3304	New runner. Manual squash. (#2110 ) * New runner. Manual squash. * Network host. * Put back trufflehog with proper extension. * No network host ? * Moving buildx install after tailscale ? * 1.79	2024-06-24 18:08:34 +02:00
drbh	cdbf802860	feat: rotate tests ci token (#2091 )	2024-06-19 17:02:58 -04:00
Daniël de Kok	11ea9ce002	CI: pass pre-commit hooks again (#2084 )	2024-06-18 09:38:21 +02:00
Guillaume LEGENDRE	4f25c67d63	CI: Tailscale improvements (#2079 ) * test local tailscale * Update build.yaml * Update build.yaml * Update build.yaml * Update build.yaml * wait for ssh * network host * change step order	2024-06-18 09:13:04 +02:00
Daniël de Kok	c8c7ccd31e	Set maximum grpc message receive size to 2GiB (#2075 ) * Set maximum grpc message receive size to 2GiB The previous default was 4MiB, which doesn't really work well for multi-modal models. * Update to Rust 1.79.0 * Fixup formatting to make PR pass	2024-06-17 16:40:44 +02:00
drbh	376a0b7ada	Support chat response format (#2046 ) * feat: support response_format in chat * fix: adjust typos * fix: add trufflehog lint	2024-06-11 10:44:56 -04:00
Luc Georges	dfca1dfc5e	fix(ci): remove unnecessary permissions (#2045 )	2024-06-10 12:16:53 -04:00
Luc Georges	4e74ec09a8	feat(ci): add trufflehog secrets detection (#2038 )	2024-06-10 11:54:13 -04:00
Daniël de Kok	bf3c813782	server: use chunked inputs The router will now send the input as chunks besides as a single string. This change modifies the server to process chunked input rather than strings. This also allows us to remove the image extraction code from the server.	2024-06-07 08:09:04 +02:00
Nicolas Patry	9765658212	Revert "Enabling CI for AMD with new runner.." This reverts commit `101ac9a760`.	2024-06-06 19:08:16 +02:00

1 2 3

112 Commits