hf_text-generation-inference/router/src
drbh 6d06473cf4
Pr 2352 ci branch (#2382)
* Fix unsigned integer underflow

Passing --max-batch-size to the launcher actually had no effect
because after a few requests the max_size passed to State::next_batch
would underflow becoming a largo positive number.

In the scheduler, as soon as the cached batch size reached the
max_batch_size the max_size passed to next_batch becomes 0.
Since the only check in that funcion is
```
if Some(batch_requests.len()) == max_size {
    break;
}
```
and it's called after the `batch_requests.len()` has
become 1, it doesn't do anything to prevent more than 0
requests from being batched.

Now we have cached batch in the server that is large than
max_batch_size and `max_size - batch_size as usize`
underflows.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

* fix: update v3 scheduler and ensure max_batch_size > 0

---------

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2024-08-09 10:54:32 +02:00
..
infer Pr 2352 ci branch (#2382) 2024-08-09 10:54:32 +02:00
config.rs add gptj modeling in TGI #2366 (CI RUN) (#2372) 2024-08-07 21:32:37 -04:00
kserve.rs fix: simplify kserve endpoint and fix imports (#2119) 2024-06-25 19:30:10 -04:00
lib.rs feat: prefer stop over eos_token to align with openai finish_reason (#2344) 2024-08-06 13:09:50 -04:00
logging.rs Rebase TRT-llm (#2331) 2024-07-31 10:33:10 +02:00
main.rs.back Rebase TRT-llm (#2331) 2024-07-31 10:33:10 +02:00
server.rs feat: return the generated text when parsing fails (#2353) 2024-08-06 13:10:19 -04:00
usage_stats.rs refactor usage stats (#2339) 2024-07-31 16:29:07 +02:00
validation.rs Rebase TRT-llm (#2331) 2024-07-31 10:33:10 +02:00