Morgan Funtowicz
|
2aac2ff2cd
|
do the same name definition stuff for tensorrt_llm_executor_static
|
2024-07-22 11:32:54 +00:00 |
Morgan Funtowicz
|
da079df4cd
|
simplify prebuilt trtllm libraries name definition
|
2024-07-22 11:32:31 +00:00 |
Morgan Funtowicz
|
20bcaea54f
|
add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper
|
2024-07-22 09:33:38 +00:00 |
Morgan Funtowicz
|
84153702d2
|
add some more information in CMakeLists.txt to correctly install executorWorker
|
2024-07-22 08:43:10 +00:00 |
Morgan Funtowicz
|
d5464d2f80
|
add initial Dockerfile for TRTLLM backend
|
2024-07-19 22:08:12 +00:00 |
Morgan Funtowicz
|
6300bab8b4
|
make sure executor_worker is provided
|
2024-07-19 11:57:10 +00:00 |
Morgan Funtowicz
|
97723d1458
|
add logging in case of decoding error
|
2024-07-18 22:19:25 +00:00 |
Morgan Funtowicz
|
9ea7f9e950
|
remove logging
|
2024-07-18 22:08:46 +00:00 |
Morgan Funtowicz
|
e82dc30e8a
|
expose information about potential error happening while decoding
|
2024-07-18 22:07:59 +00:00 |
Morgan Funtowicz
|
a19d318947
|
define a shared struct to hold the result of a decoding step
|
2024-07-18 21:33:04 +00:00 |
Morgan Funtowicz
|
a036574a86
|
add some more validation about grammar not supported
|
2024-07-18 20:57:29 +00:00 |
Morgan Funtowicz
|
b643a436f3
|
forward tgi parameters rep/freq penalty
|
2024-07-18 20:56:58 +00:00 |
Morgan Funtowicz
|
95847c6587
|
expose the internal missing start/queue timestamp
|
2024-07-18 15:57:33 +00:00 |
Morgan Funtowicz
|
fd021e5461
|
refactor Stream impl for Generation to factorise code
|
2024-07-18 14:21:43 +00:00 |
Morgan Funtowicz
|
b56c43ec30
|
remove unneeded scope variable for now
|
2024-07-18 12:57:10 +00:00 |
Morgan Funtowicz
|
0212b1774a
|
correctly forward back the log probabilities
|
2024-07-17 22:33:10 +00:00 |
Morgan Funtowicz
|
bcb96feea6
|
update invalid doc in cpp file
|
2024-07-17 22:23:22 +00:00 |
Morgan Funtowicz
|
69674a3a2d
|
add all the necessary plumbery to return the generated content
|
2024-07-17 22:12:49 +00:00 |
Morgan Funtowicz
|
ce715c76f8
|
remove unnecessary log
|
2024-07-17 22:09:50 +00:00 |
Morgan Funtowicz
|
e983ee5bb8
|
make sure the context is not dropped in the middle of the async decoding.
|
2024-07-17 21:56:50 +00:00 |
Morgan Funtowicz
|
9220340ff7
|
compute the number of maximum new tokens for each request independently
|
2024-07-17 13:55:29 +00:00 |
Morgan Funtowicz
|
a01cd030d4
|
oops missing c++ backend definitions
|
2024-07-16 20:11:59 +00:00 |
Morgan Funtowicz
|
7784a21d48
|
impl RwLock scenario for TensorRtLllmBackend
|
2024-07-16 20:08:10 +00:00 |
Morgan Funtowicz
|
31d9f4d5dc
|
expose shutdown function at ffi layer
|
2024-07-15 07:36:01 +00:00 |
Morgan Funtowicz
|
b291be64a0
|
impl the rust backend which currently cannot move the actual computation in background thread
|
2024-07-12 19:26:32 +00:00 |
Morgan Funtowicz
|
518d9a9e0b
|
make sure to track include/ffi.h to trigger rebuild from cargo
|
2024-07-12 19:26:04 +00:00 |
Morgan Funtowicz
|
344f33f398
|
end to end ffi flow working
|
2024-07-12 19:25:40 +00:00 |
Morgan Funtowicz
|
b846ae2d9e
|
use external fmt lib
|
2024-07-12 19:24:59 +00:00 |
Morgan Funtowicz
|
1972669f49
|
remove fmt import
|
2024-07-12 19:24:09 +00:00 |
Morgan Funtowicz
|
50e9fc89c8
|
working setup of the ffi layer
|
2024-07-11 21:24:32 +00:00 |
Morgan Funtowicz
|
5aede911f8
|
include guard to build example in cmakelists
|
2024-07-11 21:24:01 +00:00 |
Morgan Funtowicz
|
ed14bd6818
|
use correct include for spdlog
|
2024-07-10 13:57:31 +00:00 |
Morgan Funtowicz
|
42748d5960
|
allow converting huggingface::tokenizers error to TensorRtLlmBackendError
|
2024-07-10 13:56:57 +00:00 |
Morgan Funtowicz
|
40fe2ec0ff
|
add auth_token CLI argument to provide hf hub authentification token
|
2024-07-10 13:50:28 +00:00 |
Morgan Funtowicz
|
ca9da2dd49
|
create cmake install target to put everything relevant in installation folder
|
2024-07-10 13:48:59 +00:00 |
Morgan Funtowicz
|
4272b8cf51
|
correctly tell cmake to build dependent tensorrt-llm required libraries
|
2024-07-10 13:48:44 +00:00 |
Morgan Funtowicz
|
6c92ebe6a8
|
update trtllm to latest version a96cccafcf6365c128f004f779160951f8c0801c
|
2024-07-10 13:47:56 +00:00 |
Morgan Funtowicz
|
7b9f92a0aa
|
use spdlog release 1.14.1 moving forward
|
2024-07-10 13:47:31 +00:00 |
Morgan Funtowicz
|
13eabfabcb
|
implement the Stream method to send new tokens through a callback
|
2024-07-09 13:46:48 +00:00 |
Morgan Funtowicz
|
09292b06a0
|
updated logic and comment to detect cuda compute capabilities
|
2024-07-09 12:15:41 +00:00 |
Morgan Funtowicz
|
bec188ff73
|
bind to CUDA::nvml to retrieve compute capabilities at runtime
|
2024-07-08 22:32:41 +00:00 |
Morgan Funtowicz
|
68a0247a2c
|
unconditionally call InitializeBackend on the FFI layer
|
2024-07-08 22:09:09 +00:00 |
Morgan Funtowicz
|
da926feaa1
|
make leader executor mode working
|
2024-07-08 22:08:49 +00:00 |
Morgan Funtowicz
|
f53ffa886d
|
Specify which default log level to use depending on CMake build type
|
2024-07-08 22:06:49 +00:00 |
Morgan Funtowicz
|
4113d6d51b
|
Move to latest TensorRT-LLM version
|
2024-07-08 22:06:30 +00:00 |
Morgan Funtowicz
|
29c7cb36e5
|
Remembering to check how we can detect support for chunked context
|
2024-07-03 21:38:17 +00:00 |
Morgan Funtowicz
|
f57f2a4521
|
First version loading engines and making it ready for inference
|
2024-07-03 21:12:24 +00:00 |
Morgan Funtowicz
|
f8a1463915
|
Enable end to end CMake build
|
2024-07-03 10:27:53 +02:00 |
Morgan Funtowicz
|
818162e0c2
|
Overall build TRTLLM and deps through CMake build system
|
2024-07-02 17:16:27 +02:00 |
Morgan Funtowicz
|
6dc98abe46
|
Remove unused parameters annd force tokenizer name to be set
|
2024-07-01 16:11:59 +02:00 |