Commit Graph

881 Commits

Author SHA1 Message Date
Morgan Funtowicz 8fad7ae5a2 add some more basic info in README.md 2024-07-30 08:45:29 +00:00
Morgan Funtowicz b665e2fa0a look for cuda 12.5 2024-07-30 08:45:20 +00:00
Morgan Funtowicz 6b74f5b413 make sure variable live long enough... 2024-07-25 10:47:52 +00:00
Morgan Funtowicz 69a5804e51 use std::env::const::ARCH 2024-07-25 10:44:42 +00:00
Morgan Funtowicz fcbf2fc1ac fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time 2024-07-25 10:36:55 +00:00
Morgan Funtowicz dda015f2aa add some custom stuff for nccl linkage 2024-07-25 10:29:51 +00:00
Morgan Funtowicz 0a8c9d3dcf install to decoder_attention target 2024-07-25 10:21:54 +00:00
Morgan Funtowicz 48315e2608 clean up a bit 2024-07-24 09:52:38 +00:00
Morgan Funtowicz 9c60c9ca43 add missing dependant libraries for linking 2024-07-24 09:29:24 +00:00
Morgan Funtowicz 09bcca6a97 update build.rs to link to cuda 12.5 2024-07-24 07:50:26 +00:00
Morgan Funtowicz e4fc0ebcbe update TensorRT install script to latest 2024-07-23 22:23:30 +00:00
Morgan Funtowicz 03935f6705 update TensorRT-LLM to latest version 2024-07-23 22:13:02 +00:00
Morgan Funtowicz ef1876346c refactor the compute capabilities detection along with num gpus 2024-07-23 22:12:42 +00:00
Morgan Funtowicz 3c39ab5ac8 fix typo 2024-07-23 08:11:36 +00:00
Morgan Funtowicz 4c657ca158 make docker linter happy with same capitalization rule 2024-07-23 07:42:31 +00:00
Morgan Funtowicz d9decb4c2c move to TensorRT-LLM v0.11.0 2024-07-23 07:35:00 +00:00
Morgan Funtowicz ff151b738b refactored docker image 2024-07-23 07:34:40 +00:00
Morgan Funtowicz 3db1be412c commenting out Python part for TensorRT installation 2024-07-23 07:27:34 +00:00
Morgan Funtowicz 805e584b92 update tgi entrypoint 2024-07-22 19:13:01 +00:00
Morgan Funtowicz d0a34a95f2 adding missing ld_library_path for cuda stubs in Dockerfile 2024-07-22 15:16:39 +00:00
Morgan Funtowicz 3fd2bb70c3 fix missing / before tgi lib path 2024-07-22 14:57:03 +00:00
Morgan Funtowicz a32ef3b875 correctly setup linking search path for runtime layer 2024-07-22 14:42:43 +00:00
Morgan Funtowicz fd06ca6e7e add missing pkgconfig folder for MPI in Dockerfile 2024-07-22 14:20:06 +00:00
Morgan Funtowicz 40330c73f0 align all the linker search dependency 2024-07-22 14:14:57 +00:00
Morgan Funtowicz 6a9e925ec1 fix bad copy/past missing nvinfer linkage direction 2024-07-22 11:43:10 +00:00
Morgan Funtowicz 3597beefe2 leverage pkg-config to probe libraries paths and reuse new install structure from cmake 2024-07-22 11:39:11 +00:00
Morgan Funtowicz 2aac2ff2cd do the same name definition stuff for tensorrt_llm_executor_static 2024-07-22 11:32:54 +00:00
Morgan Funtowicz da079df4cd simplify prebuilt trtllm libraries name definition 2024-07-22 11:32:31 +00:00
Morgan Funtowicz 20bcaea54f add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper 2024-07-22 09:33:38 +00:00
Morgan Funtowicz 84153702d2 add some more information in CMakeLists.txt to correctly install executorWorker 2024-07-22 08:43:10 +00:00
Morgan Funtowicz d5464d2f80 add initial Dockerfile for TRTLLM backend 2024-07-19 22:08:12 +00:00
Morgan Funtowicz 6300bab8b4 make sure executor_worker is provided 2024-07-19 11:57:10 +00:00
Morgan Funtowicz 97723d1458 add logging in case of decoding error 2024-07-18 22:19:25 +00:00
Morgan Funtowicz 9ea7f9e950 remove logging 2024-07-18 22:08:46 +00:00
Morgan Funtowicz e82dc30e8a expose information about potential error happening while decoding 2024-07-18 22:07:59 +00:00
Morgan Funtowicz a19d318947 define a shared struct to hold the result of a decoding step 2024-07-18 21:33:04 +00:00
Morgan Funtowicz a036574a86 add some more validation about grammar not supported 2024-07-18 20:57:29 +00:00
Morgan Funtowicz b643a436f3 forward tgi parameters rep/freq penalty 2024-07-18 20:56:58 +00:00
Morgan Funtowicz 95847c6587 expose the internal missing start/queue timestamp 2024-07-18 15:57:33 +00:00
Morgan Funtowicz fd021e5461 refactor Stream impl for Generation to factorise code 2024-07-18 14:21:43 +00:00
Morgan Funtowicz b56c43ec30 remove unneeded scope variable for now 2024-07-18 12:57:10 +00:00
Morgan Funtowicz 0212b1774a correctly forward back the log probabilities 2024-07-17 22:33:10 +00:00
Morgan Funtowicz bcb96feea6 update invalid doc in cpp file 2024-07-17 22:23:22 +00:00
Morgan Funtowicz 69674a3a2d add all the necessary plumbery to return the generated content 2024-07-17 22:12:49 +00:00
Morgan Funtowicz ce715c76f8 remove unnecessary log 2024-07-17 22:09:50 +00:00
Morgan Funtowicz e983ee5bb8 make sure the context is not dropped in the middle of the async decoding. 2024-07-17 21:56:50 +00:00
Morgan Funtowicz 9220340ff7 compute the number of maximum new tokens for each request independently 2024-07-17 13:55:29 +00:00
Morgan Funtowicz a01cd030d4 oops missing c++ backend definitions 2024-07-16 20:11:59 +00:00
Morgan Funtowicz 7784a21d48 impl RwLock scenario for TensorRtLllmBackend 2024-07-16 20:08:10 +00:00
Morgan Funtowicz 31d9f4d5dc expose shutdown function at ffi layer 2024-07-15 07:36:01 +00:00