Morgan Funtowicz
|
e4fc0ebcbe
|
update TensorRT install script to latest
|
2024-07-23 22:23:30 +00:00 |
Morgan Funtowicz
|
03935f6705
|
update TensorRT-LLM to latest version
|
2024-07-23 22:13:02 +00:00 |
Morgan Funtowicz
|
ef1876346c
|
refactor the compute capabilities detection along with num gpus
|
2024-07-23 22:12:42 +00:00 |
Morgan Funtowicz
|
3c39ab5ac8
|
fix typo
|
2024-07-23 08:11:36 +00:00 |
Morgan Funtowicz
|
4c657ca158
|
make docker linter happy with same capitalization rule
|
2024-07-23 07:42:31 +00:00 |
Morgan Funtowicz
|
d9decb4c2c
|
move to TensorRT-LLM v0.11.0
|
2024-07-23 07:35:00 +00:00 |
Morgan Funtowicz
|
ff151b738b
|
refactored docker image
|
2024-07-23 07:34:40 +00:00 |
Morgan Funtowicz
|
3db1be412c
|
commenting out Python part for TensorRT installation
|
2024-07-23 07:27:34 +00:00 |
Morgan Funtowicz
|
805e584b92
|
update tgi entrypoint
|
2024-07-22 19:13:01 +00:00 |
Morgan Funtowicz
|
d0a34a95f2
|
adding missing ld_library_path for cuda stubs in Dockerfile
|
2024-07-22 15:16:39 +00:00 |
Morgan Funtowicz
|
3fd2bb70c3
|
fix missing / before tgi lib path
|
2024-07-22 14:57:03 +00:00 |
Morgan Funtowicz
|
a32ef3b875
|
correctly setup linking search path for runtime layer
|
2024-07-22 14:42:43 +00:00 |
Morgan Funtowicz
|
fd06ca6e7e
|
add missing pkgconfig folder for MPI in Dockerfile
|
2024-07-22 14:20:06 +00:00 |
Morgan Funtowicz
|
40330c73f0
|
align all the linker search dependency
|
2024-07-22 14:14:57 +00:00 |
Morgan Funtowicz
|
6a9e925ec1
|
fix bad copy/past missing nvinfer linkage direction
|
2024-07-22 11:43:10 +00:00 |
Morgan Funtowicz
|
3597beefe2
|
leverage pkg-config to probe libraries paths and reuse new install structure from cmake
|
2024-07-22 11:39:11 +00:00 |
Morgan Funtowicz
|
2aac2ff2cd
|
do the same name definition stuff for tensorrt_llm_executor_static
|
2024-07-22 11:32:54 +00:00 |
Morgan Funtowicz
|
da079df4cd
|
simplify prebuilt trtllm libraries name definition
|
2024-07-22 11:32:31 +00:00 |
Morgan Funtowicz
|
20bcaea54f
|
add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper
|
2024-07-22 09:33:38 +00:00 |
Morgan Funtowicz
|
84153702d2
|
add some more information in CMakeLists.txt to correctly install executorWorker
|
2024-07-22 08:43:10 +00:00 |
Morgan Funtowicz
|
d5464d2f80
|
add initial Dockerfile for TRTLLM backend
|
2024-07-19 22:08:12 +00:00 |
Morgan Funtowicz
|
6300bab8b4
|
make sure executor_worker is provided
|
2024-07-19 11:57:10 +00:00 |
Morgan Funtowicz
|
97723d1458
|
add logging in case of decoding error
|
2024-07-18 22:19:25 +00:00 |
Morgan Funtowicz
|
9ea7f9e950
|
remove logging
|
2024-07-18 22:08:46 +00:00 |
Morgan Funtowicz
|
e82dc30e8a
|
expose information about potential error happening while decoding
|
2024-07-18 22:07:59 +00:00 |
Morgan Funtowicz
|
a19d318947
|
define a shared struct to hold the result of a decoding step
|
2024-07-18 21:33:04 +00:00 |
Morgan Funtowicz
|
a036574a86
|
add some more validation about grammar not supported
|
2024-07-18 20:57:29 +00:00 |
Morgan Funtowicz
|
b643a436f3
|
forward tgi parameters rep/freq penalty
|
2024-07-18 20:56:58 +00:00 |
Morgan Funtowicz
|
95847c6587
|
expose the internal missing start/queue timestamp
|
2024-07-18 15:57:33 +00:00 |
Morgan Funtowicz
|
fd021e5461
|
refactor Stream impl for Generation to factorise code
|
2024-07-18 14:21:43 +00:00 |
Morgan Funtowicz
|
b56c43ec30
|
remove unneeded scope variable for now
|
2024-07-18 12:57:10 +00:00 |
Morgan Funtowicz
|
0212b1774a
|
correctly forward back the log probabilities
|
2024-07-17 22:33:10 +00:00 |
Morgan Funtowicz
|
bcb96feea6
|
update invalid doc in cpp file
|
2024-07-17 22:23:22 +00:00 |
Morgan Funtowicz
|
69674a3a2d
|
add all the necessary plumbery to return the generated content
|
2024-07-17 22:12:49 +00:00 |
Morgan Funtowicz
|
ce715c76f8
|
remove unnecessary log
|
2024-07-17 22:09:50 +00:00 |
Morgan Funtowicz
|
e983ee5bb8
|
make sure the context is not dropped in the middle of the async decoding.
|
2024-07-17 21:56:50 +00:00 |
Morgan Funtowicz
|
9220340ff7
|
compute the number of maximum new tokens for each request independently
|
2024-07-17 13:55:29 +00:00 |
Morgan Funtowicz
|
a01cd030d4
|
oops missing c++ backend definitions
|
2024-07-16 20:11:59 +00:00 |
Morgan Funtowicz
|
7784a21d48
|
impl RwLock scenario for TensorRtLllmBackend
|
2024-07-16 20:08:10 +00:00 |
Morgan Funtowicz
|
31d9f4d5dc
|
expose shutdown function at ffi layer
|
2024-07-15 07:36:01 +00:00 |
Morgan Funtowicz
|
b291be64a0
|
impl the rust backend which currently cannot move the actual computation in background thread
|
2024-07-12 19:26:32 +00:00 |
Morgan Funtowicz
|
518d9a9e0b
|
make sure to track include/ffi.h to trigger rebuild from cargo
|
2024-07-12 19:26:04 +00:00 |
Morgan Funtowicz
|
344f33f398
|
end to end ffi flow working
|
2024-07-12 19:25:40 +00:00 |
Morgan Funtowicz
|
b846ae2d9e
|
use external fmt lib
|
2024-07-12 19:24:59 +00:00 |
Morgan Funtowicz
|
1972669f49
|
remove fmt import
|
2024-07-12 19:24:09 +00:00 |
Morgan Funtowicz
|
50e9fc89c8
|
working setup of the ffi layer
|
2024-07-11 21:24:32 +00:00 |
Morgan Funtowicz
|
5aede911f8
|
include guard to build example in cmakelists
|
2024-07-11 21:24:01 +00:00 |
Morgan Funtowicz
|
ed14bd6818
|
use correct include for spdlog
|
2024-07-10 13:57:31 +00:00 |
Morgan Funtowicz
|
42748d5960
|
allow converting huggingface::tokenizers error to TensorRtLlmBackendError
|
2024-07-10 13:56:57 +00:00 |
Morgan Funtowicz
|
40fe2ec0ff
|
add auth_token CLI argument to provide hf hub authentification token
|
2024-07-10 13:50:28 +00:00 |