Commit Graph

17 Commits

Author SHA1 Message Date
OlivierDehaene 745f596c88
feat(server): use float16 (#304) 2023-05-10 15:51:10 +02:00
OlivierDehaene 68e9d6ab33
feat(server): shard token decode (#303) 2023-05-10 15:48:21 +02:00
OlivierDehaene 4096000e34
fix(server): fix typo in tokenizers decode (#269)
closes #268
2023-05-03 10:10:34 +02:00
Nick Hill 34bca0b8d3
fix(server): Small tidy of code from recent changes (#251)
remaining_decode_tokens was calculated twice in Seq2SeqLMBatch.filter()
2023-04-27 09:57:28 +02:00
Nick Hill b4cf832c40
fix(server): fix reshaping of bloom past_key_values in concatenate() (#252)
Introduced in #214 

Fixes #249
2023-04-27 09:51:27 +02:00
OlivierDehaene ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
Nick Hill 4a7dd4085a
feat(server): reduce memory requirement (#214) 2023-04-24 14:15:42 +02:00
OlivierDehaene 343437c7b5
feat(router): add device and dtype info (#215) 2023-04-21 15:36:29 +02:00
OlivierDehaene 709d8936f6
feat(router): drop requests when client closes the channel (#202) 2023-04-20 11:07:40 +02:00
OlivierDehaene 5fa8ae041c
feat(server): optimize decode for sane tokenizers (#170) 2023-04-12 12:03:10 +02:00
OlivierDehaene 299217c95c
feat(server): add flash attention llama (#144) 2023-04-11 16:38:22 +02:00
OlivierDehaene 9987960062
feat(router): make router input validation optional (#164) 2023-04-09 20:22:27 +02:00
OlivierDehaene 05e9a796cc
feat(server): flash neoX (#133) 2023-03-24 14:02:14 +01:00
OlivierDehaene b49dbf2d88
fix(server): use server tokenizer as gt (#128) 2023-03-16 12:12:26 +01:00
OlivierDehaene 8ad60b752f
fix(server): add position ids to neox (#126) 2023-03-15 13:12:49 +01:00
OlivierDehaene 941cd42e0c
fix(server): fix index out of range for watermarking (#110) 2023-03-08 18:29:08 +01:00
OlivierDehaene 3fef90d50f
feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00