hf_text-generation-inference

History

Funtowicz Morgan ba5fc7d922 Add support for stop words in TRTLLM (#2678 ) * feat(trtllm): rewrite health to not account for current state * chore(looper): cleanup a bit more * feat(post_processing): max_new_tokens is const evaluated now * chore(ffi):formatting * feat(trtllm): add stop words handling # Conflicts: # backends/trtllm/lib/backend.cpp * chore(trtllm): create specific parallelconfig factory and logging init methods * chore(trtllm): define a macro for SizeType cast * chore(trtllm): use GetParallelConfig * chore(trtllm): minor refactoring * chore(trtllm): validate there are enough GPus on the system for the desired model * chore(trtllm): ensure max throughput scheduling policy is selected * chore(trtllm): minor fix * chore(router): minor refactorings * feat(docker): build with-slurm ompi * feat(docker): add python3.10 dev to runtime deps * chore(docker): add mpi to ld_library_path * chore(docker): install transformers * feat(trtllm): detect stop_words from generation_config.json		2024-10-25 10:58:34 +02:00
..
errors.rs	[TENSORRT-LLM] - Implement new looper thread based backend (#2357 )	2024-10-25 07:17:14 +02:00
ffi.cpp	Add support for stop words in TRTLLM (#2678 )	2024-10-25 10:58:34 +02:00
lib.rs	[TENSORRT-LLM] - Implement new looper thread based backend (#2357 )	2024-10-25 07:17:14 +02:00
looper.rs	Add support for stop words in TRTLLM (#2678 )	2024-10-25 10:58:34 +02:00
main.rs	[TENSORRT-LLM] - Implement new looper thread based backend (#2357 )	2024-10-25 07:17:14 +02:00
utils.rs	[TENSORRT-LLM] - Implement new looper thread based backend (#2357 )	2024-10-25 07:17:14 +02:00