Commit Graph

99 Commits

Author SHA1 Message Date
OlivierDehaene 53ee09c0b0
fea(dockerfile): better layer caching (#159) 2023-04-14 10:12:21 +02:00
OlivierDehaene 64347b05ff
fix(ci): fix CVE in github-slug-action (#174) 2023-04-13 12:43:05 +02:00
OlivierDehaene 880a76eed5
feat(server): support sharded santacoder (#167) 2023-04-12 17:18:08 +02:00
OlivierDehaene 5fa8ae041c
feat(server): optimize decode for sane tokenizers (#170) 2023-04-12 12:03:10 +02:00
OlivierDehaene 6f0f1d70f6
v0.5.0 (#168) 2023-04-11 20:32:18 +02:00
OlivierDehaene f26dfd0dc1
feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene 299217c95c
feat(server): add flash attention llama (#144) 2023-04-11 16:38:22 +02:00
OlivierDehaene 9987960062
feat(router): make router input validation optional (#164) 2023-04-09 20:22:27 +02:00
OlivierDehaene 1883d8ecde
feat(docker): improve flash_attention caching (#160) 2023-04-09 19:59:16 +02:00
OlivierDehaene 3f2542bb6a
fix(server): fix escape characters in stop sequence (#155) 2023-04-05 19:37:41 +02:00
OlivierDehaene c0aeb32583
feat(server): flash santacoder (#153) 2023-04-03 19:06:42 +02:00
OlivierDehaene fef1a1c381
v0.4.3 (#152) 2023-03-30 17:28:14 +02:00
OlivierDehaene 84722f3e33
v0.4.2 (#151) 2023-03-30 17:10:01 +02:00
OlivierDehaene 08b7e4a282
fix(server): fix flash neox rotary embeddings (#150) 2023-03-30 16:12:23 +02:00
OlivierDehaene 610bb1f978
feat(benchmark): tui based benchmarking tool (#149) 2023-03-30 15:26:27 +02:00
OlivierDehaene c9bdaa8b73
feat(server): reduce mlp and attn in one op for flash neox (#145) 2023-03-28 16:51:41 +02:00
OlivierDehaene f000068944
feat(server): clear cache on error (#143) 2023-03-28 11:29:35 +02:00
Nick Hill 8e8dd984d8
feat(server): Add mypy-protobuf (#141)
Generates .pyi files for protobuf stubs which provide strong typing
information. Very helpful for IDE auto-completion, etc.
2023-03-27 09:25:15 +02:00
Nick Hill 462530c2b0
fix(server): Avoid using try/except to determine kind of AutoModel (#142) 2023-03-27 09:23:22 +02:00
OlivierDehaene ab5fd8cf93
v0.4.1 (#140) 2023-03-26 16:37:51 +02:00
OlivierDehaene 678b2f3900
feat(server): cleanup flash neox loading (#139) 2023-03-26 16:37:21 +02:00
OlivierDehaene d6a93fe992
fix(server): fix flash-neox scores warping (#137) 2023-03-24 18:21:41 +01:00
OlivierDehaene 05e9a796cc
feat(server): flash neoX (#133) 2023-03-24 14:02:14 +01:00
OlivierDehaene b49dbf2d88
fix(server): use server tokenizer as gt (#128) 2023-03-16 12:12:26 +01:00
OlivierDehaene 8ad60b752f
fix(server): add position ids to neox (#126) 2023-03-15 13:12:49 +01:00
OlivierDehaene cbd36aa4d1
fix(server): revert gpt-neox optims (#123) 2023-03-13 22:57:08 +01:00
OlivierDehaene 411d6247f4
v0.4.0 (#119) 2023-03-09 16:07:01 +01:00
OlivierDehaene c0795de2f2
fix(server): do not warp prefill logits (#116) 2023-03-09 13:00:10 +01:00
OlivierDehaene 1a2d68250a
feat: support typical sampling (#114)
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene 941cd42e0c
fix(server): fix index out of range for watermarking (#110) 2023-03-08 18:29:08 +01:00
OlivierDehaene b1485e18c5
fix(server): fix galactica batch (#106)
closes #105
2023-03-07 20:05:21 +01:00
OlivierDehaene 3fef90d50f
feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00
OlivierDehaene cd5961b5da
feat: allow local models (#101)
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene 9b205d33cc
fix(server): fix generate_stream by forcing tokens to be decoded correctly (#100) 2023-03-06 13:22:58 +01:00
OlivierDehaene 1c19b0934e
v0.3.2 (#97) 2023-03-03 18:42:20 +01:00
OlivierDehaene 0b6807caa4
feat(server): fix transformers commit (#96) 2023-03-03 17:56:27 +01:00
OlivierDehaene 2d39f199ae
feat(server): update to hf_transfer==0.1.2 (#93) 2023-03-03 11:26:27 +01:00
OlivierDehaene 9b8ea6a6c7
feat(server): add logits watermark (#90) 2023-03-02 12:30:41 +01:00
OlivierDehaene 65e2f1624e
fix(server): fix token_is_special (#87) 2023-02-24 17:20:00 +01:00
OlivierDehaene 0ac184ce77
feat(server): add special token bool (#85) 2023-02-24 15:55:57 +01:00
OlivierDehaene 4b1c9720c0
v0.3.1 (#84) 2023-02-24 13:27:41 +01:00
OlivierDehaene 44ce098c10
feat(server): pre-allocate max attention mask (#75) 2023-02-24 12:49:21 +01:00
OlivierDehaene 78063c0569
fix(server): remove position_ids from galactica forward (#82)
closes #80
2023-02-20 19:28:57 +01:00
OlivierDehaene 17bc841b1b
feat(server): enable hf-transfer (#76) 2023-02-18 14:04:11 +01:00
OlivierDehaene c720555adc
v0.3.0 (#72) 2023-02-16 17:28:29 +01:00
OlivierDehaene 439fcaf810
feat(router): add prometheus metrics scrape endpoint (#71) 2023-02-16 17:18:53 +01:00
OlivierDehaene c5a4a1faf3
feat(server): improve download logging (#66) 2023-02-15 16:11:32 +01:00
OlivierDehaene 0fbc691946
feat: add safetensors conversion (#63) 2023-02-14 13:02:16 +01:00
OlivierDehaene 9af454142a
feat: add distributed tracing (#62) 2023-02-13 13:02:45 +01:00
OlivierDehaene 1ad3250b89
fix(docker): increase shm size (#60) 2023-02-08 17:53:33 +01:00