Commit Graph

813 Commits

Author SHA1 Message Date
drbh a7556ba800 fix: refactors and helpful comments 2024-06-24 13:39:56 +00:00
drbh a07b612989 fix: revert skips and prefer updated ci token for tests 2024-06-19 17:31:13 +00:00
drbh c9e4526b9d fix: skip llama test CI (temp) 2 2024-06-19 17:19:40 +00:00
drbh ce70fce925 fix: skip llama test due to CI issue (temp) 2024-06-19 17:03:13 +00:00
drbh 4f1543d3c7 fix: refactors and adjust flash llama lora logic 2024-06-19 16:13:42 +00:00
drbh 224455f389 Merge branch 'main' into lora-internal 2024-06-18 09:50:41 -04:00
Daniël de Kok 11ea9ce002
CI: pass pre-commit hooks again (#2084) 2024-06-18 09:38:21 +02:00
Guillaume LEGENDRE 4f25c67d63
CI: Tailscale improvements (#2079)
* test local tailscale

* Update build.yaml

* Update build.yaml

* Update build.yaml

* Update build.yaml

* wait for ssh

* network host

* change step order
2024-06-18 09:13:04 +02:00
Daniël de Kok c8c7ccd31e
Set maximum grpc message receive size to 2GiB (#2075)
* Set maximum grpc message receive size to 2GiB

The previous default was 4MiB, which doesn't really work well for
multi-modal models.

* Update to Rust 1.79.0

* Fixup formatting to make PR pass
2024-06-17 16:40:44 +02:00
Ziru Niu 0f7d38e774
fix build.rs watch files (#2072) 2024-06-17 12:10:01 +02:00
Lysandre Debut 131838919e
Contributing guide & Code of Conduct (#2074)
* Contributing guide & Code of Conduct

* Redirect to GitHub's tutorial on PRs
2024-06-17 12:09:31 +02:00
Daniël de Kok e903770897
Support different image sizes in prefill in VLMs (#2065)
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. #2056). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.

Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
2024-06-17 10:49:41 +02:00
drbh 1104885f00
Merge branch 'main' into lora-internal 2024-06-14 10:06:15 -04:00
drbh 0e1c28cafd fix: merge 'main' into lora-internal to resolve conflicts 2024-06-14 14:02:33 +00:00
drbh 06c3254cc5 fix: avoid dockerfile conflict 2024-06-14 13:58:38 +00:00
Alvaro Moran 445f313504
Adding architecture document (#2044)
* doc: adding architecture document

* doc: add architecture to toctree

* fix: avoid cargo lock changes

* fix: avoid cargo lock tweak

---------

Co-authored-by: drbh <david.richard.holtz@gmail.com>
2024-06-14 09:28:34 -04:00
Tiezhen WANG 96b7b40ca3
Update the link for qwen2 (#2068)
* Update the link for qwen2

* Fix Qwen2 model URL in model table

* Fix too eager staging

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-06-14 11:59:33 +02:00
Daniël de Kok 093a27c528
Add support for GPTQ Marlin (#2052)
Add support for GPTQ Marlin kernels

GPTQ Marlin extends the Marlin kernels to support common GPTQ
configurations:

- bits: 4 or 8
- groupsize: -1, 32, 64, or 128
- desc_act: true/false

Using the GPTQ Marlin kernels requires repacking the parameters in the
Marlin quantizer format.

The kernels were contributed by Neural Magic to VLLM. We vendor them
here for convenience.
2024-06-14 09:45:42 +02:00
drbh aa88c4fd3a fix: add lora kernel to dockerfile, support running without kernels and refactors 2024-06-14 00:35:07 +00:00
drbh f433f1f770
implement Open Inference Protocol endpoints (#1942)
* feat: add kserve feature and basic routes

* feat: implement infer endpoint wrapper around generate

* fix: refactor and improve types

* fix: improve infer and simplify

* fix: cleanup and improve api docs

* fix: refactor and encapsulate kserve feat in file

* fix: remove typos after rebase
2024-06-13 12:51:51 -04:00
drbh 42aa8ee1bb
PR #2049 CI run (#2054)
* Use minijinja's pycompat mode for python methods

* fix: cargo fmt lint for pre commit

---------

Co-authored-by: Armin Ronacher <armin.ronacher@active-4.com>
2024-06-13 11:53:49 -04:00
OlivierDehaene 90184df79c
fix(layers): fix SuRotaryEmbedding (#2060)
* fix(layers): fix SuRotaryEmbedding

* change arange

* remove logs
2024-06-12 18:24:47 +02:00
OlivierDehaene 521de6cacd
fix(server): fix OPT implementation (#2061) 2024-06-12 18:22:20 +02:00
drbh 376a0b7ada
Support chat response format (#2046)
* feat: support response_format in chat

* fix: adjust typos

* fix: add trufflehog lint
2024-06-11 10:44:56 -04:00
fxmarty a6e4d63c86
Update LLMM1 bound (#2050)
update commit
2024-06-11 19:30:29 +08:00
Luc Georges dfca1dfc5e
fix(ci): remove unnecessary permissions (#2045) 2024-06-10 12:16:53 -04:00
Luc Georges 4e74ec09a8
feat(ci): add trufflehog secrets detection (#2038) 2024-06-10 11:54:13 -04:00
Derek d6cf63ca53 Update lora.md
Fixing spam image
2024-06-10 10:24:21 -04:00
Derek 1be1ebc438 Update lora.md
Fixed a typo
2024-06-10 10:24:21 -04:00
drbh ce40ad26fd fix: add model_id to IdeficsCausalLM 2024-06-10 10:24:21 -04:00
drbh 101b95adc4 fix: update all models forwards to include adapter_data 2024-06-10 10:24:21 -04:00
drbh 1deb372564 fix: add adapter_data param to phi and neox 2024-06-10 10:24:21 -04:00
drbh b1169273fd fix: add adapter_data param and avoid missing layers 2024-06-10 10:24:21 -04:00
drbh 91f407226d feat: support if vlm models 2024-06-10 10:24:21 -04:00
drbh a563a93113 fix: rename doc to retry ci build 2024-06-10 10:24:21 -04:00
drbh 611225f017 feat: support base model generation and refactors 2024-06-10 10:24:21 -04:00
drbh 43ec9dfe32 feat: bump launcher and add new lora docs 2024-06-10 10:24:21 -04:00
drbh 81707bfbfa fix: include rust code for adapter id 2024-06-10 10:23:52 -04:00
drbh 68399c1ae3 feat: prefer model id in request 2024-06-10 10:23:52 -04:00
drbh de56a81c5c feat: add lora support to mistral and refactors 2024-06-10 10:23:52 -04:00
drbh 9c45d34983 fix: add model_id to model test 2024-06-10 10:23:52 -04:00
drbh dc0f76553c fix: pass model_id for all causal and seq2seq lms 2024-06-10 10:23:52 -04:00
drbh 88bd5c2c92 fix: pass model_id for all flash causal lms 2024-06-10 10:23:52 -04:00
drbh 73eb2ae255 fix: refactor and move changes to v3 proto 2024-06-10 10:23:52 -04:00
drbh c927376725 fix: adjust adapter_segments logic when in batch 2024-06-10 10:23:52 -04:00
drbh ad088d51fa fix: adjust batch for bgmv 2024-06-10 10:23:52 -04:00
drbh 8984ce6c69 feat: perfer loraxs custom punica kernels and add mlp loras 2024-06-10 10:23:52 -04:00
drbh d5f21d57d1 fix: prefer adapter_data and refactors 2024-06-10 10:23:52 -04:00
drbh 8b50f4b779 feat: prefer lorax implementation and port loading logic 2024-06-10 10:23:52 -04:00
drbh c661631225 feat: baseline impl single request multi lora support 2024-06-10 10:23:52 -04:00