hf_text-generation-inference

History

drbh 6d06473cf4 Pr 2352 ci branch (#2382 ) * Fix unsigned integer underflow Passing --max-batch-size to the launcher actually had no effect because after a few requests the max_size passed to State::next_batch would underflow becoming a largo positive number. In the scheduler, as soon as the cached batch size reached the max_batch_size the max_size passed to next_batch becomes 0. Since the only check in that funcion is ``` if Some(batch_requests.len()) == max_size { break; } ``` and it's called after the `batch_requests.len()` has become 1, it doesn't do anything to prevent more than 0 requests from being batched. Now we have cached batch in the server that is large than max_batch_size and `max_size - batch_size as usize` underflows. Signed-off-by: Max de Bayser <mbayser@br.ibm.com> * fix: update v3 scheduler and ensure max_batch_size > 0 --------- Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>		2024-08-09 10:54:32 +02:00
..
infer	Pr 2352 ci branch (#2382 )	2024-08-09 10:54:32 +02:00
config.rs	add gptj modeling in TGI #2366 (CI RUN) (#2372 )	2024-08-07 21:32:37 -04:00
kserve.rs	fix: simplify kserve endpoint and fix imports (#2119 )	2024-06-25 19:30:10 -04:00
lib.rs	feat: prefer stop over eos_token to align with openai finish_reason (#2344 )	2024-08-06 13:09:50 -04:00
logging.rs	Rebase TRT-llm (#2331 )	2024-07-31 10:33:10 +02:00
main.rs.back	Rebase TRT-llm (#2331 )	2024-07-31 10:33:10 +02:00
server.rs	feat: return the generated text when parsing fails (#2353 )	2024-08-06 13:10:19 -04:00
usage_stats.rs	refactor usage stats (#2339 )	2024-07-31 16:29:07 +02:00
validation.rs	Rebase TRT-llm (#2331 )	2024-07-31 10:33:10 +02:00