Commit Graph

112 Commits

Author SHA1 Message Date
fxmarty 291453fe88 Merge branch 'main' into ci_amd3 2024-07-16 15:15:17 +02:00
Nicolas Patry 4c976fb406
Updating the self check (#2209)
* Updating the self check

* Fix.

* Revert the CLI .

* cli.

* Space.

* Revert cargo update.
2024-07-09 17:23:48 +02:00
Nicolas Patry fe710af25f
Adding sanity check to openapi docs. 2024-07-09 11:13:48 +02:00
Guillaume LEGENDRE 5e2a305880
Fix buildx cache + change runner type (#2176)
* Update build.yaml

* Update build.yaml

* change to S3 cache

* change to CPU Runners

* remove comments
2024-07-08 18:13:32 +02:00
fxmarty d7c6061387 missing lib 2024-07-08 14:28:50 +02:00
fxmarty 4e3f687427 use base docker image 2024-07-08 13:10:09 +02:00
fxmarty 8c590be463 Merge branch 'main' into ci_amd3 2024-07-08 13:06:39 +02:00
Daniël de Kok 05c094fcfa
Consistently take `prefix` in model constructors (#2191)
* Consistently take `prefix` in model constructors

* Release test check fix

* Misc refactor-related fixes
2024-07-05 16:07:48 +02:00
Daniël de Kok 67ef0649cf
GPTQ CI improvements (#2151)
* Add more representative Llama GPTQ test

The Llama GPTQ test is updated to use a model with the commonly-used
quantizer config format and activation sorting. The old test is
kept around (but renamed) since it tests the format produced by
`text-generation-server quantize`.

* Add support for manually triggering a release build
2024-07-05 14:12:16 +02:00
drbh 571530dd9a
feat: improve update_docs for openapi schema (#2169)
* feat: add pre commit step to force schema update when router changes

* fix: prefer improved update_doc and start server and compare

* fix: adjust typo

* fix: adjust revert typo

* fix: update workflow to use update_doc md command

* feat: improve workflow to check openapi schema too

* fix: adjust timeout for CI

* fix: adjust raise condition and install server in ci

* fix: install protoc before server

* feat: improve update doc and add command to print router schema

* fix: adjust autodoc workflow

* fix: explicitly install protoc and python

* fix: alllow trailing space in openapi schema diff
2024-07-03 09:53:35 +02:00
fxmarty 29a416078c Merge branch 'main' into ci_amd3 2024-07-02 15:32:53 +02:00
Felix Marty add4d42cb3 do not use tunableop for non flash-causal-lm modezls 2024-07-02 12:52:55 +00:00
Guillaume LEGENDRE 963b6c6f0f
Ci test (#2124)
* first test with registry mirror

* change push registry

* remove comments

* Move cache to push registry

* fix registry url

* Update .github/workflows/ci_build.yaml

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-07-02 12:45:38 +02:00
Nicolas Patry d0225b1015
GH router. (#2153) 2024-07-01 15:42:26 +02:00
fxmarty 59849777de Merge branch 'main' into ci_amd3 2024-07-01 14:14:46 +02:00
Felix Marty 05d1011b4f fix xpu build 2024-06-28 16:08:27 +00:00
Felix Marty 3d50ff71b7 bump torch to more recent version 2024-06-28 13:10:43 +00:00
Felix Marty 87db820627 fix rm 2024-06-28 09:49:20 +00:00
Nicolas Patry fb98ab273f
Fixing the CI to also run in release when it's a tag ? (#2138) 2024-06-28 09:31:09 +02:00
Felix Marty eaa6890b3c remove hidden 2024-06-27 15:24:14 +00:00
Felix Marty 0a5485d8a0 avoid permissions issues 2024-06-27 14:51:11 +00:00
Felix Marty 60a96a9ae3 do not use private registry in cleanup cache step 2024-06-26 13:57:05 +00:00
Felix Marty 4067fc8211 login to registry 2024-06-26 10:58:52 +00:00
Felix Marty 2330052aa2 debug 2024-06-26 10:43:57 +00:00
fxmarty 227f78f3fe Merge branch 'main' into ci_amd3 2024-06-26 12:08:42 +02:00
Daniël de Kok fc9c3153e5
Add pytest release marker (#2114)
* Add pytest release marker

Annotate a test with `@pytest.mark.release` and it only gets run
with `pytest integration-tests --release`.

* Mark many models as `release` to speed up CI
2024-06-25 16:53:20 +02:00
Nicolas Patry 9e2fdf57c0
Removing IPEX_AVAIL. (#2115)
* Removing IPEX_AVAIL.

Chose to unify CPU and XPU under `ipex`. Most code is exactly similar
except for a very few spots.

The biggest number of spots is the kv-cache layout and the flash_xxx.py
files.
Since those files should be removed soon and factored away, we should
not need them.

* Forgot a few places.

* Unrelated change.

* Fixing HF_TOKEN.

* HF_TOKEN
2024-06-25 13:20:57 +02:00
Felix Marty 04298e5799 add back credentials 2024-06-25 09:22:49 +00:00
fxmarty dc53846456 Merge branch 'main' into ci_amd3 2024-06-25 11:20:00 +02:00
Lucain 3447c722fd
Support `HF_TOKEN` environment variable (#2066)
* Support HF_TOKEN environement variable

* Load test.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-06-25 09:23:12 +02:00
Felix Marty 09a41f2c43
do not skip workflow on cuda, fix no space left no device 2024-06-24 18:51:59 +02:00
Felix Marty f16f0ad92b
do not login to internal registry 2024-06-24 18:51:58 +02:00
Felix Marty 13bbf6cc5c
does ci pass without tailscale? 2024-06-24 18:51:33 +02:00
Felix Marty ee62872d66
test tailscale independently 2024-06-24 18:51:33 +02:00
Felix Marty 284894303a
remove require_backend decorators on handles, for some reasons fails in github actions 2024-06-24 18:51:32 +02:00
Felix Marty 393234de9b
hopefully fix ci 2024-06-24 18:51:32 +02:00
Felix Marty 67999773f3
fix workflow 2024-06-24 18:51:32 +02:00
Felix Marty 5fb8c275c3
fix style & typo 2024-06-24 18:51:30 +02:00
fxmarty 40b342a12e
fix space 2024-06-24 18:51:08 +02:00
fxmarty 1e10597d0c
update 2024-06-24 18:50:17 +02:00
Nicolas Patry 480d3b3304
New runner. Manual squash. (#2110)
* New runner. Manual squash.

* Network host.

* Put back trufflehog with proper extension.

* No network host ?

* Moving buildx install after tailscale ?

* 1.79
2024-06-24 18:08:34 +02:00
drbh cdbf802860
feat: rotate tests ci token (#2091) 2024-06-19 17:02:58 -04:00
Daniël de Kok 11ea9ce002
CI: pass pre-commit hooks again (#2084) 2024-06-18 09:38:21 +02:00
Guillaume LEGENDRE 4f25c67d63
CI: Tailscale improvements (#2079)
* test local tailscale

* Update build.yaml

* Update build.yaml

* Update build.yaml

* Update build.yaml

* wait for ssh

* network host

* change step order
2024-06-18 09:13:04 +02:00
Daniël de Kok c8c7ccd31e
Set maximum grpc message receive size to 2GiB (#2075)
* Set maximum grpc message receive size to 2GiB

The previous default was 4MiB, which doesn't really work well for
multi-modal models.

* Update to Rust 1.79.0

* Fixup formatting to make PR pass
2024-06-17 16:40:44 +02:00
drbh 376a0b7ada
Support chat response format (#2046)
* feat: support response_format in chat

* fix: adjust typos

* fix: add trufflehog lint
2024-06-11 10:44:56 -04:00
Luc Georges dfca1dfc5e
fix(ci): remove unnecessary permissions (#2045) 2024-06-10 12:16:53 -04:00
Luc Georges 4e74ec09a8
feat(ci): add trufflehog secrets detection (#2038) 2024-06-10 11:54:13 -04:00
Daniël de Kok bf3c813782 server: use chunked inputs
The router will now send the input as chunks besides as a single
string. This change modifies the server to process chunked input
rather than strings. This also allows us to remove the image
extraction code from the server.
2024-06-07 08:09:04 +02:00
Nicolas Patry 9765658212 Revert "Enabling CI for AMD with new runner.."
This reverts commit 101ac9a760.
2024-06-06 19:08:16 +02:00