Commit Graph

5 Commits

Author SHA1 Message Date
OlivierDehaene ab7ccf5bc3
feat: add payload limit (#2726)
* feat: add payload limit

* update launcher
2024-11-21 18:20:15 +00:00
Nicolas Patry ed87b464b4
Fixing "deadlock" when python prompts for trust_remote_code by always (#2664)
specifiying a value.
2024-10-25 06:39:21 +02:00
OlivierDehaene 41c2623735
feat: allow any supported payload on /invocations (#2683)
* feat: allow any supported payload on /invocations

* update openAPI

* update doc
2024-10-23 11:26:01 +00:00
OlivierDehaene a6a0c97ed9
feat: prefill chunking (#2600)
* wip

* rollback

* refactor to use prefix/postfix namming + fix all_input_ids_tensor

* maybe patching vlms?

* fix filter and concat

* wip, no filter, no concat

* current

* add prepare_for_prefill

* working

* load tested

* re-create slots

* re-create slots

* fix slot_filtering_indices

* feedback loop

* remove log

* fix benchmarker

* fix vlm and seq2seq

* rename to cache and input lengths

* fix prefill logprobs

* fix launcher

* fix logprobs?

* idk at this point

* max input length

* omfg

* remove debugging lines

* fix tests

* fix mllama

* fix cargo tests

* remove support chunking for paged

* Fixing non blocked attentions

* Fixing dtype + AMD, Ipex targets.

* lint fix.

* rename

* Fix prefix_caching variable, remove defaults in server (confusing a lot
of the times).

* Add simple resolution when user specifies ATTENTION=paged.

* Put back non default simple tests.

* Fix env name

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-10-16 12:49:33 +02:00
OlivierDehaene 10e6f29295
chore: Add old V2 backend (#2551)
* wip

* added v2
2024-09-24 08:38:17 +02:00