drbh
a7556ba800
fix: refactors and helpful comments
2024-06-24 13:39:56 +00:00
drbh
a07b612989
fix: revert skips and prefer updated ci token for tests
2024-06-19 17:31:13 +00:00
drbh
c9e4526b9d
fix: skip llama test CI (temp) 2
2024-06-19 17:19:40 +00:00
drbh
ce70fce925
fix: skip llama test due to CI issue (temp)
2024-06-19 17:03:13 +00:00
drbh
4f1543d3c7
fix: refactors and adjust flash llama lora logic
2024-06-19 16:13:42 +00:00
drbh
224455f389
Merge branch 'main' into lora-internal
2024-06-18 09:50:41 -04:00
Daniël de Kok
11ea9ce002
CI: pass pre-commit hooks again ( #2084 )
2024-06-18 09:38:21 +02:00
Guillaume LEGENDRE
4f25c67d63
CI: Tailscale improvements ( #2079 )
...
* test local tailscale
* Update build.yaml
* Update build.yaml
* Update build.yaml
* Update build.yaml
* wait for ssh
* network host
* change step order
2024-06-18 09:13:04 +02:00
Daniël de Kok
c8c7ccd31e
Set maximum grpc message receive size to 2GiB ( #2075 )
...
* Set maximum grpc message receive size to 2GiB
The previous default was 4MiB, which doesn't really work well for
multi-modal models.
* Update to Rust 1.79.0
* Fixup formatting to make PR pass
2024-06-17 16:40:44 +02:00
Ziru Niu
0f7d38e774
fix build.rs watch files ( #2072 )
2024-06-17 12:10:01 +02:00
Lysandre Debut
131838919e
Contributing guide & Code of Conduct ( #2074 )
...
* Contributing guide & Code of Conduct
* Redirect to GitHub's tutorial on PRs
2024-06-17 12:09:31 +02:00
Daniël de Kok
e903770897
Support different image sizes in prefill in VLMs ( #2065 )
...
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. #2056 ). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.
Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
2024-06-17 10:49:41 +02:00
drbh
1104885f00
Merge branch 'main' into lora-internal
2024-06-14 10:06:15 -04:00
drbh
0e1c28cafd
fix: merge 'main' into lora-internal to resolve conflicts
2024-06-14 14:02:33 +00:00
drbh
06c3254cc5
fix: avoid dockerfile conflict
2024-06-14 13:58:38 +00:00
Alvaro Moran
445f313504
Adding architecture document ( #2044 )
...
* doc: adding architecture document
* doc: add architecture to toctree
* fix: avoid cargo lock changes
* fix: avoid cargo lock tweak
---------
Co-authored-by: drbh <david.richard.holtz@gmail.com>
2024-06-14 09:28:34 -04:00
Tiezhen WANG
96b7b40ca3
Update the link for qwen2 ( #2068 )
...
* Update the link for qwen2
* Fix Qwen2 model URL in model table
* Fix too eager staging
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-06-14 11:59:33 +02:00
Daniël de Kok
093a27c528
Add support for GPTQ Marlin ( #2052 )
...
Add support for GPTQ Marlin kernels
GPTQ Marlin extends the Marlin kernels to support common GPTQ
configurations:
- bits: 4 or 8
- groupsize: -1, 32, 64, or 128
- desc_act: true/false
Using the GPTQ Marlin kernels requires repacking the parameters in the
Marlin quantizer format.
The kernels were contributed by Neural Magic to VLLM. We vendor them
here for convenience.
2024-06-14 09:45:42 +02:00
drbh
aa88c4fd3a
fix: add lora kernel to dockerfile, support running without kernels and refactors
2024-06-14 00:35:07 +00:00
drbh
f433f1f770
implement Open Inference Protocol endpoints ( #1942 )
...
* feat: add kserve feature and basic routes
* feat: implement infer endpoint wrapper around generate
* fix: refactor and improve types
* fix: improve infer and simplify
* fix: cleanup and improve api docs
* fix: refactor and encapsulate kserve feat in file
* fix: remove typos after rebase
2024-06-13 12:51:51 -04:00
drbh
42aa8ee1bb
PR #2049 CI run ( #2054 )
...
* Use minijinja's pycompat mode for python methods
* fix: cargo fmt lint for pre commit
---------
Co-authored-by: Armin Ronacher <armin.ronacher@active-4.com>
2024-06-13 11:53:49 -04:00
OlivierDehaene
90184df79c
fix(layers): fix SuRotaryEmbedding ( #2060 )
...
* fix(layers): fix SuRotaryEmbedding
* change arange
* remove logs
2024-06-12 18:24:47 +02:00
OlivierDehaene
521de6cacd
fix(server): fix OPT implementation ( #2061 )
2024-06-12 18:22:20 +02:00
drbh
376a0b7ada
Support chat response format ( #2046 )
...
* feat: support response_format in chat
* fix: adjust typos
* fix: add trufflehog lint
2024-06-11 10:44:56 -04:00
fxmarty
a6e4d63c86
Update LLMM1 bound ( #2050 )
...
update commit
2024-06-11 19:30:29 +08:00
Luc Georges
dfca1dfc5e
fix(ci): remove unnecessary permissions ( #2045 )
2024-06-10 12:16:53 -04:00
Luc Georges
4e74ec09a8
feat(ci): add trufflehog secrets detection ( #2038 )
2024-06-10 11:54:13 -04:00
Derek
d6cf63ca53
Update lora.md
...
Fixing spam image
2024-06-10 10:24:21 -04:00
Derek
1be1ebc438
Update lora.md
...
Fixed a typo
2024-06-10 10:24:21 -04:00
drbh
ce40ad26fd
fix: add model_id to IdeficsCausalLM
2024-06-10 10:24:21 -04:00
drbh
101b95adc4
fix: update all models forwards to include adapter_data
2024-06-10 10:24:21 -04:00
drbh
1deb372564
fix: add adapter_data param to phi and neox
2024-06-10 10:24:21 -04:00
drbh
b1169273fd
fix: add adapter_data param and avoid missing layers
2024-06-10 10:24:21 -04:00
drbh
91f407226d
feat: support if vlm models
2024-06-10 10:24:21 -04:00
drbh
a563a93113
fix: rename doc to retry ci build
2024-06-10 10:24:21 -04:00
drbh
611225f017
feat: support base model generation and refactors
2024-06-10 10:24:21 -04:00
drbh
43ec9dfe32
feat: bump launcher and add new lora docs
2024-06-10 10:24:21 -04:00
drbh
81707bfbfa
fix: include rust code for adapter id
2024-06-10 10:23:52 -04:00
drbh
68399c1ae3
feat: prefer model id in request
2024-06-10 10:23:52 -04:00
drbh
de56a81c5c
feat: add lora support to mistral and refactors
2024-06-10 10:23:52 -04:00
drbh
9c45d34983
fix: add model_id to model test
2024-06-10 10:23:52 -04:00
drbh
dc0f76553c
fix: pass model_id for all causal and seq2seq lms
2024-06-10 10:23:52 -04:00
drbh
88bd5c2c92
fix: pass model_id for all flash causal lms
2024-06-10 10:23:52 -04:00
drbh
73eb2ae255
fix: refactor and move changes to v3 proto
2024-06-10 10:23:52 -04:00
drbh
c927376725
fix: adjust adapter_segments logic when in batch
2024-06-10 10:23:52 -04:00
drbh
ad088d51fa
fix: adjust batch for bgmv
2024-06-10 10:23:52 -04:00
drbh
8984ce6c69
feat: perfer loraxs custom punica kernels and add mlp loras
2024-06-10 10:23:52 -04:00
drbh
d5f21d57d1
fix: prefer adapter_data and refactors
2024-06-10 10:23:52 -04:00
drbh
8b50f4b779
feat: prefer lorax implementation and port loading logic
2024-06-10 10:23:52 -04:00
drbh
c661631225
feat: baseline impl single request multi lora support
2024-06-10 10:23:52 -04:00