OlivierDehaene
|
6f88bd9390
|
feat: add triton kernels to decrease latency of large batches (#2687)
* feat: add triton kernels to decrease latency of large batches
* cast to int32
* fix kernel
* fix kernel
* disable triton on rocm
* fix speculation
* add slots filtering kernel
|
2024-10-25 21:10:00 +00:00 |