Commit Graph

166 Commits

Author SHA1 Message Date
OlivierDehaene a6c18c39bb
feat(server): use cuda graph in logits warping (#302) 2023-05-10 19:08:54 +02:00
OlivierDehaene 68e9d6ab33
feat(server): shard token decode (#303) 2023-05-10 15:48:21 +02:00
Nicolas Patry b4aa87db58
fea(server): decrease convert RAM requirements (#286) 2023-05-05 17:57:02 +02:00
Nicolas Patry 690fc31757
fix(server): fix convert (#284) 2023-05-05 15:28:08 +02:00
Nicolas Patry f08343d44d
fix(server): Removes the parallelism in file convertion (during download) (#275) 2023-05-04 15:22:54 +02:00
OlivierDehaene 85aa7e2e7b
feat(server): support hf endpoint weight layout (#266) 2023-05-03 11:36:24 +02:00
OlivierDehaene f26dfd0dc1
feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene 3f2542bb6a
fix(server): fix escape characters in stop sequence (#155) 2023-04-05 19:37:41 +02:00
OlivierDehaene 610bb1f978
feat(benchmark): tui based benchmarking tool (#149) 2023-03-30 15:26:27 +02:00
OlivierDehaene d6a93fe992
fix(server): fix flash-neox scores warping (#137) 2023-03-24 18:21:41 +01:00
OlivierDehaene 05e9a796cc
feat(server): flash neoX (#133) 2023-03-24 14:02:14 +01:00
OlivierDehaene 8ad60b752f
fix(server): add position ids to neox (#126) 2023-03-15 13:12:49 +01:00
OlivierDehaene c0795de2f2
fix(server): do not warp prefill logits (#116) 2023-03-09 13:00:10 +01:00
OlivierDehaene 1a2d68250a
feat: support typical sampling (#114)
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene 941cd42e0c
fix(server): fix index out of range for watermarking (#110) 2023-03-08 18:29:08 +01:00
OlivierDehaene 3fef90d50f
feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00