Morgan Funtowicz
|
8fad7ae5a2
|
add some more basic info in README.md
|
2024-07-30 08:45:29 +00:00 |
Morgan Funtowicz
|
b665e2fa0a
|
look for cuda 12.5
|
2024-07-30 08:45:20 +00:00 |
Morgan Funtowicz
|
6b74f5b413
|
make sure variable live long enough...
|
2024-07-25 10:47:52 +00:00 |
Morgan Funtowicz
|
69a5804e51
|
use std::env::const::ARCH
|
2024-07-25 10:44:42 +00:00 |
Morgan Funtowicz
|
fcbf2fc1ac
|
fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time
|
2024-07-25 10:36:55 +00:00 |
Morgan Funtowicz
|
dda015f2aa
|
add some custom stuff for nccl linkage
|
2024-07-25 10:29:51 +00:00 |
Morgan Funtowicz
|
0a8c9d3dcf
|
install to decoder_attention target
|
2024-07-25 10:21:54 +00:00 |
Morgan Funtowicz
|
48315e2608
|
clean up a bit
|
2024-07-24 09:52:38 +00:00 |
Morgan Funtowicz
|
9c60c9ca43
|
add missing dependant libraries for linking
|
2024-07-24 09:29:24 +00:00 |
Morgan Funtowicz
|
09bcca6a97
|
update build.rs to link to cuda 12.5
|
2024-07-24 07:50:26 +00:00 |
Morgan Funtowicz
|
e4fc0ebcbe
|
update TensorRT install script to latest
|
2024-07-23 22:23:30 +00:00 |
Morgan Funtowicz
|
03935f6705
|
update TensorRT-LLM to latest version
|
2024-07-23 22:13:02 +00:00 |
Morgan Funtowicz
|
ef1876346c
|
refactor the compute capabilities detection along with num gpus
|
2024-07-23 22:12:42 +00:00 |
Morgan Funtowicz
|
3c39ab5ac8
|
fix typo
|
2024-07-23 08:11:36 +00:00 |
Morgan Funtowicz
|
4c657ca158
|
make docker linter happy with same capitalization rule
|
2024-07-23 07:42:31 +00:00 |
Morgan Funtowicz
|
d9decb4c2c
|
move to TensorRT-LLM v0.11.0
|
2024-07-23 07:35:00 +00:00 |
Morgan Funtowicz
|
ff151b738b
|
refactored docker image
|
2024-07-23 07:34:40 +00:00 |
Morgan Funtowicz
|
3db1be412c
|
commenting out Python part for TensorRT installation
|
2024-07-23 07:27:34 +00:00 |
Morgan Funtowicz
|
805e584b92
|
update tgi entrypoint
|
2024-07-22 19:13:01 +00:00 |
Morgan Funtowicz
|
d0a34a95f2
|
adding missing ld_library_path for cuda stubs in Dockerfile
|
2024-07-22 15:16:39 +00:00 |
Morgan Funtowicz
|
3fd2bb70c3
|
fix missing / before tgi lib path
|
2024-07-22 14:57:03 +00:00 |
Morgan Funtowicz
|
a32ef3b875
|
correctly setup linking search path for runtime layer
|
2024-07-22 14:42:43 +00:00 |
Morgan Funtowicz
|
fd06ca6e7e
|
add missing pkgconfig folder for MPI in Dockerfile
|
2024-07-22 14:20:06 +00:00 |
Morgan Funtowicz
|
40330c73f0
|
align all the linker search dependency
|
2024-07-22 14:14:57 +00:00 |
Morgan Funtowicz
|
6a9e925ec1
|
fix bad copy/past missing nvinfer linkage direction
|
2024-07-22 11:43:10 +00:00 |
Morgan Funtowicz
|
3597beefe2
|
leverage pkg-config to probe libraries paths and reuse new install structure from cmake
|
2024-07-22 11:39:11 +00:00 |
Morgan Funtowicz
|
2aac2ff2cd
|
do the same name definition stuff for tensorrt_llm_executor_static
|
2024-07-22 11:32:54 +00:00 |
Morgan Funtowicz
|
da079df4cd
|
simplify prebuilt trtllm libraries name definition
|
2024-07-22 11:32:31 +00:00 |
Morgan Funtowicz
|
20bcaea54f
|
add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper
|
2024-07-22 09:33:38 +00:00 |
Morgan Funtowicz
|
84153702d2
|
add some more information in CMakeLists.txt to correctly install executorWorker
|
2024-07-22 08:43:10 +00:00 |
Morgan Funtowicz
|
d5464d2f80
|
add initial Dockerfile for TRTLLM backend
|
2024-07-19 22:08:12 +00:00 |
Morgan Funtowicz
|
6300bab8b4
|
make sure executor_worker is provided
|
2024-07-19 11:57:10 +00:00 |
Morgan Funtowicz
|
97723d1458
|
add logging in case of decoding error
|
2024-07-18 22:19:25 +00:00 |
Morgan Funtowicz
|
9ea7f9e950
|
remove logging
|
2024-07-18 22:08:46 +00:00 |
Morgan Funtowicz
|
e82dc30e8a
|
expose information about potential error happening while decoding
|
2024-07-18 22:07:59 +00:00 |
Morgan Funtowicz
|
a19d318947
|
define a shared struct to hold the result of a decoding step
|
2024-07-18 21:33:04 +00:00 |
Morgan Funtowicz
|
a036574a86
|
add some more validation about grammar not supported
|
2024-07-18 20:57:29 +00:00 |
Morgan Funtowicz
|
b643a436f3
|
forward tgi parameters rep/freq penalty
|
2024-07-18 20:56:58 +00:00 |
Morgan Funtowicz
|
95847c6587
|
expose the internal missing start/queue timestamp
|
2024-07-18 15:57:33 +00:00 |
Morgan Funtowicz
|
fd021e5461
|
refactor Stream impl for Generation to factorise code
|
2024-07-18 14:21:43 +00:00 |
Morgan Funtowicz
|
b56c43ec30
|
remove unneeded scope variable for now
|
2024-07-18 12:57:10 +00:00 |
Morgan Funtowicz
|
0212b1774a
|
correctly forward back the log probabilities
|
2024-07-17 22:33:10 +00:00 |
Morgan Funtowicz
|
bcb96feea6
|
update invalid doc in cpp file
|
2024-07-17 22:23:22 +00:00 |
Morgan Funtowicz
|
69674a3a2d
|
add all the necessary plumbery to return the generated content
|
2024-07-17 22:12:49 +00:00 |
Morgan Funtowicz
|
ce715c76f8
|
remove unnecessary log
|
2024-07-17 22:09:50 +00:00 |
Morgan Funtowicz
|
e983ee5bb8
|
make sure the context is not dropped in the middle of the async decoding.
|
2024-07-17 21:56:50 +00:00 |
Morgan Funtowicz
|
9220340ff7
|
compute the number of maximum new tokens for each request independently
|
2024-07-17 13:55:29 +00:00 |
Morgan Funtowicz
|
a01cd030d4
|
oops missing c++ backend definitions
|
2024-07-16 20:11:59 +00:00 |
Morgan Funtowicz
|
7784a21d48
|
impl RwLock scenario for TensorRtLllmBackend
|
2024-07-16 20:08:10 +00:00 |
Morgan Funtowicz
|
31d9f4d5dc
|
expose shutdown function at ffi layer
|
2024-07-15 07:36:01 +00:00 |