hf_text-generation-inference/backends
Daniël de Kok deec30f893
hotfix: avoid non-prefilled block use when using prefix caching (#2489)
The minimum batch size logic could cause prefix blocks to be
deallocated without prefill. The next allocation of the same
prefix would then use garbage blocks.
2024-09-05 15:09:29 +02:00
..
client Lots of improvements (Still 2 allocators) (#2449) 2024-08-29 16:29:01 +02:00
grpc-metadata Rebase TRT-llm (#2331) 2024-07-31 10:33:10 +02:00
trtllm More fixes trtllm (#2342) 2024-08-14 12:02:05 +02:00
v3 hotfix: avoid non-prefilled block use when using prefix caching (#2489) 2024-09-05 15:09:29 +02:00