Commit Graph

855 Commits

Author SHA1 Message Date
Morgan Funtowicz 2aac2ff2cd do the same name definition stuff for tensorrt_llm_executor_static 2024-07-22 11:32:54 +00:00
Morgan Funtowicz da079df4cd simplify prebuilt trtllm libraries name definition 2024-07-22 11:32:31 +00:00
Morgan Funtowicz 20bcaea54f add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper 2024-07-22 09:33:38 +00:00
Morgan Funtowicz 84153702d2 add some more information in CMakeLists.txt to correctly install executorWorker 2024-07-22 08:43:10 +00:00
Morgan Funtowicz d5464d2f80 add initial Dockerfile for TRTLLM backend 2024-07-19 22:08:12 +00:00
Morgan Funtowicz 6300bab8b4 make sure executor_worker is provided 2024-07-19 11:57:10 +00:00
Morgan Funtowicz 97723d1458 add logging in case of decoding error 2024-07-18 22:19:25 +00:00
Morgan Funtowicz 9ea7f9e950 remove logging 2024-07-18 22:08:46 +00:00
Morgan Funtowicz e82dc30e8a expose information about potential error happening while decoding 2024-07-18 22:07:59 +00:00
Morgan Funtowicz a19d318947 define a shared struct to hold the result of a decoding step 2024-07-18 21:33:04 +00:00
Morgan Funtowicz a036574a86 add some more validation about grammar not supported 2024-07-18 20:57:29 +00:00
Morgan Funtowicz b643a436f3 forward tgi parameters rep/freq penalty 2024-07-18 20:56:58 +00:00
Morgan Funtowicz 95847c6587 expose the internal missing start/queue timestamp 2024-07-18 15:57:33 +00:00
Morgan Funtowicz fd021e5461 refactor Stream impl for Generation to factorise code 2024-07-18 14:21:43 +00:00
Morgan Funtowicz b56c43ec30 remove unneeded scope variable for now 2024-07-18 12:57:10 +00:00
Morgan Funtowicz 0212b1774a correctly forward back the log probabilities 2024-07-17 22:33:10 +00:00
Morgan Funtowicz bcb96feea6 update invalid doc in cpp file 2024-07-17 22:23:22 +00:00
Morgan Funtowicz 69674a3a2d add all the necessary plumbery to return the generated content 2024-07-17 22:12:49 +00:00
Morgan Funtowicz ce715c76f8 remove unnecessary log 2024-07-17 22:09:50 +00:00
Morgan Funtowicz e983ee5bb8 make sure the context is not dropped in the middle of the async decoding. 2024-07-17 21:56:50 +00:00
Morgan Funtowicz 9220340ff7 compute the number of maximum new tokens for each request independently 2024-07-17 13:55:29 +00:00
Morgan Funtowicz a01cd030d4 oops missing c++ backend definitions 2024-07-16 20:11:59 +00:00
Morgan Funtowicz 7784a21d48 impl RwLock scenario for TensorRtLllmBackend 2024-07-16 20:08:10 +00:00
Morgan Funtowicz 31d9f4d5dc expose shutdown function at ffi layer 2024-07-15 07:36:01 +00:00
Morgan Funtowicz b291be64a0 impl the rust backend which currently cannot move the actual computation in background thread 2024-07-12 19:26:32 +00:00
Morgan Funtowicz 518d9a9e0b make sure to track include/ffi.h to trigger rebuild from cargo 2024-07-12 19:26:04 +00:00
Morgan Funtowicz 344f33f398 end to end ffi flow working 2024-07-12 19:25:40 +00:00
Morgan Funtowicz b846ae2d9e use external fmt lib 2024-07-12 19:24:59 +00:00
Morgan Funtowicz 1972669f49 remove fmt import 2024-07-12 19:24:09 +00:00
Morgan Funtowicz 50e9fc89c8 working setup of the ffi layer 2024-07-11 21:24:32 +00:00
Morgan Funtowicz 5aede911f8 include guard to build example in cmakelists 2024-07-11 21:24:01 +00:00
Morgan Funtowicz ed14bd6818 use correct include for spdlog 2024-07-10 13:57:31 +00:00
Morgan Funtowicz 42748d5960 allow converting huggingface::tokenizers error to TensorRtLlmBackendError 2024-07-10 13:56:57 +00:00
Morgan Funtowicz 40fe2ec0ff add auth_token CLI argument to provide hf hub authentification token 2024-07-10 13:50:28 +00:00
Morgan Funtowicz ca9da2dd49 create cmake install target to put everything relevant in installation folder 2024-07-10 13:48:59 +00:00
Morgan Funtowicz 4272b8cf51 correctly tell cmake to build dependent tensorrt-llm required libraries 2024-07-10 13:48:44 +00:00
Morgan Funtowicz 6c92ebe6a8 update trtllm to latest version a96cccafcf6365c128f004f779160951f8c0801c 2024-07-10 13:47:56 +00:00
Morgan Funtowicz 7b9f92a0aa use spdlog release 1.14.1 moving forward 2024-07-10 13:47:31 +00:00
Morgan Funtowicz 13eabfabcb implement the Stream method to send new tokens through a callback 2024-07-09 13:46:48 +00:00
Morgan Funtowicz 09292b06a0 updated logic and comment to detect cuda compute capabilities 2024-07-09 12:15:41 +00:00
Morgan Funtowicz bec188ff73 bind to CUDA::nvml to retrieve compute capabilities at runtime 2024-07-08 22:32:41 +00:00
Morgan Funtowicz 68a0247a2c unconditionally call InitializeBackend on the FFI layer 2024-07-08 22:09:09 +00:00
Morgan Funtowicz da926feaa1 make leader executor mode working 2024-07-08 22:08:49 +00:00
Morgan Funtowicz f53ffa886d Specify which default log level to use depending on CMake build type 2024-07-08 22:06:49 +00:00
Morgan Funtowicz 4113d6d51b Move to latest TensorRT-LLM version 2024-07-08 22:06:30 +00:00
Morgan Funtowicz 29c7cb36e5 Remembering to check how we can detect support for chunked context 2024-07-03 21:38:17 +00:00
Morgan Funtowicz f57f2a4521 First version loading engines and making it ready for inference 2024-07-03 21:12:24 +00:00
Morgan Funtowicz f8a1463915 Enable end to end CMake build 2024-07-03 10:27:53 +02:00
Morgan Funtowicz 818162e0c2 Overall build TRTLLM and deps through CMake build system 2024-07-02 17:16:27 +02:00
Morgan Funtowicz 6dc98abe46 Remove unused parameters annd force tokenizer name to be set 2024-07-01 16:11:59 +02:00