OlivierDehaene
|
e74bd41e0f
|
feat(server): add paged attention to flash models (#516)
Closes #478
|
2023-06-30 19:09:59 +02:00 |
OlivierDehaene
|
62f91f78ac
|
feat(server): support vectorized warpers in flash causal lm (#317)
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
|
2023-05-26 12:30:27 +02:00 |
OlivierDehaene
|
4f6d038c0b
|
fix(server): fix multinomial implem in Sampling
|
2023-05-11 13:30:38 +02:00 |
OlivierDehaene
|
a6c18c39bb
|
feat(server): use cuda graph in logits warping (#302)
|
2023-05-10 19:08:54 +02:00 |
OlivierDehaene
|
68e9d6ab33
|
feat(server): shard token decode (#303)
|
2023-05-10 15:48:21 +02:00 |
OlivierDehaene
|
3f2542bb6a
|
fix(server): fix escape characters in stop sequence (#155)
|
2023-04-05 19:37:41 +02:00 |
OlivierDehaene
|
610bb1f978
|
feat(benchmark): tui based benchmarking tool (#149)
|
2023-03-30 15:26:27 +02:00 |
OlivierDehaene
|
05e9a796cc
|
feat(server): flash neoX (#133)
|
2023-03-24 14:02:14 +01:00 |
OlivierDehaene
|
c0795de2f2
|
fix(server): do not warp prefill logits (#116)
|
2023-03-09 13:00:10 +01:00 |
OlivierDehaene
|
1a2d68250a
|
feat: support typical sampling (#114)
closes #112
|
2023-03-09 11:33:57 +01:00 |
OlivierDehaene
|
941cd42e0c
|
fix(server): fix index out of range for watermarking (#110)
|
2023-03-08 18:29:08 +01:00 |
OlivierDehaene
|
3fef90d50f
|
feat(clients): Python client (#103)
|
2023-03-07 18:52:22 +01:00 |