hf_text-generation-inference/backends/trtllm/src
Funtowicz Morgan ba5fc7d922
Add support for stop words in TRTLLM (#2678)
* feat(trtllm): rewrite health to not account for current state

* chore(looper): cleanup a bit more

* feat(post_processing): max_new_tokens is const evaluated now

* chore(ffi):formatting

* feat(trtllm): add stop words handling

# Conflicts:
#	backends/trtllm/lib/backend.cpp

* chore(trtllm): create specific parallelconfig factory and logging init methods

* chore(trtllm): define a macro for SizeType cast

* chore(trtllm): use GetParallelConfig

* chore(trtllm): minor refactoring

* chore(trtllm): validate there are enough GPus on the system for the desired model

* chore(trtllm): ensure max throughput scheduling policy is selected

* chore(trtllm): minor fix

* chore(router): minor refactorings

* feat(docker): build with-slurm ompi

* feat(docker): add python3.10 dev to runtime deps

* chore(docker): add mpi to ld_library_path

* chore(docker): install transformers

* feat(trtllm): detect stop_words from generation_config.json
2024-10-25 10:58:34 +02:00
..
errors.rs [TENSORRT-LLM] - Implement new looper thread based backend (#2357) 2024-10-25 07:17:14 +02:00
ffi.cpp Add support for stop words in TRTLLM (#2678) 2024-10-25 10:58:34 +02:00
lib.rs [TENSORRT-LLM] - Implement new looper thread based backend (#2357) 2024-10-25 07:17:14 +02:00
looper.rs Add support for stop words in TRTLLM (#2678) 2024-10-25 10:58:34 +02:00
main.rs [TENSORRT-LLM] - Implement new looper thread based backend (#2357) 2024-10-25 07:17:14 +02:00
utils.rs [TENSORRT-LLM] - Implement new looper thread based backend (#2357) 2024-10-25 07:17:14 +02:00