Commit Graph

12 Commits

Author SHA1 Message Date
OlivierDehaene e74bd41e0f
feat(server): add paged attention to flash models (#516)
Closes #478
2023-06-30 19:09:59 +02:00
OlivierDehaene 62f91f78ac
feat(server): support vectorized warpers in flash causal lm (#317)
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
2023-05-26 12:30:27 +02:00
OlivierDehaene 4f6d038c0b fix(server): fix multinomial implem in Sampling 2023-05-11 13:30:38 +02:00
OlivierDehaene a6c18c39bb
feat(server): use cuda graph in logits warping (#302) 2023-05-10 19:08:54 +02:00
OlivierDehaene 68e9d6ab33
feat(server): shard token decode (#303) 2023-05-10 15:48:21 +02:00
OlivierDehaene 3f2542bb6a
fix(server): fix escape characters in stop sequence (#155) 2023-04-05 19:37:41 +02:00
OlivierDehaene 610bb1f978
feat(benchmark): tui based benchmarking tool (#149) 2023-03-30 15:26:27 +02:00
OlivierDehaene 05e9a796cc
feat(server): flash neoX (#133) 2023-03-24 14:02:14 +01:00
OlivierDehaene c0795de2f2
fix(server): do not warp prefill logits (#116) 2023-03-09 13:00:10 +01:00
OlivierDehaene 1a2d68250a
feat: support typical sampling (#114)
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene 941cd42e0c
fix(server): fix index out of range for watermarking (#110) 2023-03-08 18:29:08 +01:00
OlivierDehaene 3fef90d50f
feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00