Commit Graph

30 Commits

Author SHA1 Message Date
OlivierDehaene 55bd4fed7d
feat(router): add best_of parameter (#117) 2023-03-09 15:30:54 +01:00
OlivierDehaene 5fd2dcb513
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible (#108) 2023-03-08 13:53:41 +01:00
OlivierDehaene 0ac38d336a
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 11:06:59 +01:00
OlivierDehaene cd5961b5da
feat: allow local models (#101)
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene 240c4187fd
fix(launcher): add router parameters to launcher (#95) 2023-03-03 16:01:25 +01:00
OlivierDehaene 9b8ea6a6c7
feat(server): add logits watermark (#90) 2023-03-02 12:30:41 +01:00
OlivierDehaene 17bc841b1b
feat(server): enable hf-transfer (#76) 2023-02-18 14:04:11 +01:00
OlivierDehaene 6796d38c6d
feat(router): add cors allow origin options (#73) 2023-02-17 18:22:00 +01:00
OlivierDehaene 7b3d460d21
fix(launcher): copy current env vars to subprocesses (#70)
closes #69
2023-02-16 11:20:23 +01:00
OlivierDehaene 68455353f5
feat(launcher): add disable_custom_kernels arg (#67) 2023-02-15 16:23:45 +01:00
OlivierDehaene c5a4a1faf3
feat(server): improve download logging (#66) 2023-02-15 16:11:32 +01:00
OlivierDehaene 0fbc691946
feat: add safetensors conversion (#63) 2023-02-14 13:02:16 +01:00
OlivierDehaene 9af454142a
feat: add distributed tracing (#62) 2023-02-13 13:02:45 +01:00
OlivierDehaene 1ad3250b89
fix(docker): increase shm size (#60) 2023-02-08 17:53:33 +01:00
OlivierDehaene 4acc42a605
fix(server): better handling of inference mode (#57) 2023-02-07 15:38:22 +01:00
OlivierDehaene 20c3c5940c
feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00
OlivierDehaene 7b870e1e18
feat(router): use background task to manage request queue (#52)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-02-02 14:59:27 +01:00
OlivierDehaene 775115e3a5
feat(server): allow the server to use a local weight cache (#49) 2023-02-01 16:22:10 +01:00
OlivierDehaene f830706b21
feat(server): Support GPT-Neox (#39) 2023-01-31 18:53:56 +01:00
OlivierDehaene 15511edc01
feat(server): Support SantaCoder (#26) 2023-01-20 12:24:39 +01:00
Nick Hill e6d3eb5d5d
fix(server): Minor refactorization using new_zeros (#24)
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene fcc2c5fcbf
feat(launcher): Log server stdout (#19)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-01-05 12:01:23 +01:00
OlivierDehaene 4236e41b0d feat(server): Improved doc 2022-11-07 12:53:56 +01:00
OlivierDehaene cea6051eff feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard 2022-11-04 18:31:08 +01:00
OlivierDehaene b3b7ea0d74 feat: Use json formatter by default in docker image 2022-11-02 17:29:56 +01:00
OlivierDehaene 3cf6368c77 feat(server): Support all AutoModelForCausalLM on a best effort basis 2022-10-28 19:24:00 +02:00
OlivierDehaene 09674e6df9 feat(server): Support bitsandbytes 2022-10-27 14:25:29 +02:00
Nicolas Patry c8ce9b2515
feat(server): Use safetensors
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2022-10-22 20:00:15 +02:00
OlivierDehaene c837893370 feat(router): Add max_waiting_tokens 2022-10-21 16:40:05 +02:00
Olivier Dehaene f16f2f5ae1 v0.1.0 2022-10-20 19:14:44 +02:00