Commit Graph

65 Commits

Author SHA1 Message Date
Ubuntu 2c9e1171bc [WIP] Adding GPTQ support for llama 2023-05-11 12:05:35 +00:00
OlivierDehaene e250282213
feat(docker): add benchmarking tool to docker image (#298) 2023-05-09 13:19:31 +02:00
Nicolas Patry e68509add7
feat(launcher): Improve error message when download process fails. (#276) 2023-05-04 15:29:29 +02:00
OlivierDehaene b67908e0cf
fix(launcher): pass weights cache override to the download process (#274)
closes #273
2023-05-03 23:39:35 +02:00
OlivierDehaene 85aa7e2e7b
feat(server): support hf endpoint weight layout (#266) 2023-05-03 11:36:24 +02:00
Nicolas Patry 411b0d4e1f
chore(github): add templates (#264) 2023-05-02 15:43:19 +02:00
Nicolas Patry b0b97fd9a7
doc(launcher): add more docs to the `launcher` itself and link in the README (#257) 2023-04-29 11:53:42 +02:00
Nicolas Patry db2b4e0754
feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry 77758f603b
chore(launcher): refactor logic (#242)
Hopefully it's cleaner
2023-04-26 14:43:36 +02:00
OlivierDehaene ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene 6ded76a4ae
v0.6.0 (#222) 2023-04-21 21:00:57 +02:00
OlivierDehaene 252f42c1e6
fix(router): add auth token to get model info (#207) 2023-04-19 20:06:06 +02:00
OlivierDehaene 2475aede61
feat(router): add info route (#196)
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene 7a1ba58557
fix(docker): fix docker image dependencies (#187) 2023-04-17 00:26:47 +02:00
OlivierDehaene e3a63b6fbc
fix(launcher): revert change on shard errors (#173) 2023-04-13 11:07:11 +02:00
OlivierDehaene 6f0f1d70f6
v0.5.0 (#168) 2023-04-11 20:32:18 +02:00
OlivierDehaene f26dfd0dc1
feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene 299217c95c
feat(server): add flash attention llama (#144) 2023-04-11 16:38:22 +02:00
OlivierDehaene e63a21eb4d
feat(launcher): allow disabling hf_transfer (#161) 2023-04-09 20:00:05 +02:00
OlivierDehaene fef1a1c381
v0.4.3 (#152) 2023-03-30 17:28:14 +02:00
OlivierDehaene 84722f3e33
v0.4.2 (#151) 2023-03-30 17:10:01 +02:00
OlivierDehaene ab5fd8cf93
v0.4.1 (#140) 2023-03-26 16:37:51 +02:00
OlivierDehaene 411d6247f4
v0.4.0 (#119) 2023-03-09 16:07:01 +01:00
OlivierDehaene 55bd4fed7d
feat(router): add best_of parameter (#117) 2023-03-09 15:30:54 +01:00
OlivierDehaene 5fd2dcb513
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible (#108) 2023-03-08 13:53:41 +01:00
OlivierDehaene 0ac38d336a
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 11:06:59 +01:00
OlivierDehaene cd5961b5da
feat: allow local models (#101)
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene 9b205d33cc
fix(server): fix generate_stream by forcing tokens to be decoded correctly (#100) 2023-03-06 13:22:58 +01:00
OlivierDehaene 1c19b0934e
v0.3.2 (#97) 2023-03-03 18:42:20 +01:00
OlivierDehaene 240c4187fd
fix(launcher): add router parameters to launcher (#95) 2023-03-03 16:01:25 +01:00
OlivierDehaene 9b8ea6a6c7
feat(server): add logits watermark (#90) 2023-03-02 12:30:41 +01:00
OlivierDehaene 0ac184ce77
feat(server): add special token bool (#85) 2023-02-24 15:55:57 +01:00
OlivierDehaene 4b1c9720c0
v0.3.1 (#84) 2023-02-24 13:27:41 +01:00
OlivierDehaene 17bc841b1b
feat(server): enable hf-transfer (#76) 2023-02-18 14:04:11 +01:00
OlivierDehaene 6796d38c6d
feat(router): add cors allow origin options (#73) 2023-02-17 18:22:00 +01:00
OlivierDehaene c720555adc
v0.3.0 (#72) 2023-02-16 17:28:29 +01:00
OlivierDehaene 7b3d460d21
fix(launcher): copy current env vars to subprocesses (#70)
closes #69
2023-02-16 11:20:23 +01:00
OlivierDehaene 68455353f5
feat(launcher): add disable_custom_kernels arg (#67) 2023-02-15 16:23:45 +01:00
OlivierDehaene c5a4a1faf3
feat(server): improve download logging (#66) 2023-02-15 16:11:32 +01:00
OlivierDehaene 0fbc691946
feat: add safetensors conversion (#63) 2023-02-14 13:02:16 +01:00
OlivierDehaene 9af454142a
feat: add distributed tracing (#62) 2023-02-13 13:02:45 +01:00
OlivierDehaene 1ad3250b89
fix(docker): increase shm size (#60) 2023-02-08 17:53:33 +01:00
OlivierDehaene 2fe5e1b30e
V0.2.1 (#58) 2023-02-07 15:40:25 +01:00
OlivierDehaene 4acc42a605
fix(server): better handling of inference mode (#57) 2023-02-07 15:38:22 +01:00
OlivierDehaene 20c3c5940c
feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00
OlivierDehaene b1482d9048
breaking(router): modify /generate API to only return generated text (#50)
@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene 7b870e1e18
feat(router): use background task to manage request queue (#52)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-02-02 14:59:27 +01:00
OlivierDehaene 775115e3a5
feat(server): allow the server to use a local weight cache (#49) 2023-02-01 16:22:10 +01:00
OlivierDehaene f830706b21
feat(server): Support GPT-Neox (#39) 2023-01-31 18:53:56 +01:00
OlivierDehaene 017a2a8c2f
feat: Add token streaming using ServerSideEvents support (#41) 2023-01-31 17:04:00 +01:00