OlivierDehaene
|
85aa7e2e7b
|
feat(server): support hf endpoint weight layout (#266)
|
2023-05-03 11:36:24 +02:00 |
Nicolas Patry
|
411b0d4e1f
|
chore(github): add templates (#264)
|
2023-05-02 15:43:19 +02:00 |
Nicolas Patry
|
b0b97fd9a7
|
doc(launcher): add more docs to the `launcher` itself and link in the README (#257)
|
2023-04-29 11:53:42 +02:00 |
Nicolas Patry
|
db2b4e0754
|
feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
|
2023-04-26 20:23:54 +02:00 |
Nicolas Patry
|
77758f603b
|
chore(launcher): refactor logic (#242)
Hopefully it's cleaner
|
2023-04-26 14:43:36 +02:00 |
OlivierDehaene
|
ebc74d5666
|
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2023-04-24 17:59:00 +02:00 |
OlivierDehaene
|
252f42c1e6
|
fix(router): add auth token to get model info (#207)
|
2023-04-19 20:06:06 +02:00 |
OlivierDehaene
|
2475aede61
|
feat(router): add info route (#196)
close #125
|
2023-04-18 16:16:06 +02:00 |
OlivierDehaene
|
7a1ba58557
|
fix(docker): fix docker image dependencies (#187)
|
2023-04-17 00:26:47 +02:00 |
OlivierDehaene
|
e3a63b6fbc
|
fix(launcher): revert change on shard errors (#173)
|
2023-04-13 11:07:11 +02:00 |
OlivierDehaene
|
f26dfd0dc1
|
feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
|
2023-04-11 19:16:41 +02:00 |
OlivierDehaene
|
e63a21eb4d
|
feat(launcher): allow disabling hf_transfer (#161)
|
2023-04-09 20:00:05 +02:00 |
OlivierDehaene
|
55bd4fed7d
|
feat(router): add best_of parameter (#117)
|
2023-03-09 15:30:54 +01:00 |
OlivierDehaene
|
5fd2dcb513
|
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible (#108)
|
2023-03-08 13:53:41 +01:00 |
OlivierDehaene
|
0ac38d336a
|
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107)
|
2023-03-08 11:06:59 +01:00 |
OlivierDehaene
|
cd5961b5da
|
feat: allow local models (#101)
closes #99
|
2023-03-06 14:39:36 +01:00 |
OlivierDehaene
|
240c4187fd
|
fix(launcher): add router parameters to launcher (#95)
|
2023-03-03 16:01:25 +01:00 |
OlivierDehaene
|
9b8ea6a6c7
|
feat(server): add logits watermark (#90)
|
2023-03-02 12:30:41 +01:00 |
OlivierDehaene
|
17bc841b1b
|
feat(server): enable hf-transfer (#76)
|
2023-02-18 14:04:11 +01:00 |
OlivierDehaene
|
6796d38c6d
|
feat(router): add cors allow origin options (#73)
|
2023-02-17 18:22:00 +01:00 |
OlivierDehaene
|
7b3d460d21
|
fix(launcher): copy current env vars to subprocesses (#70)
closes #69
|
2023-02-16 11:20:23 +01:00 |
OlivierDehaene
|
68455353f5
|
feat(launcher): add disable_custom_kernels arg (#67)
|
2023-02-15 16:23:45 +01:00 |
OlivierDehaene
|
c5a4a1faf3
|
feat(server): improve download logging (#66)
|
2023-02-15 16:11:32 +01:00 |
OlivierDehaene
|
0fbc691946
|
feat: add safetensors conversion (#63)
|
2023-02-14 13:02:16 +01:00 |
OlivierDehaene
|
9af454142a
|
feat: add distributed tracing (#62)
|
2023-02-13 13:02:45 +01:00 |
OlivierDehaene
|
1ad3250b89
|
fix(docker): increase shm size (#60)
|
2023-02-08 17:53:33 +01:00 |
OlivierDehaene
|
4acc42a605
|
fix(server): better handling of inference mode (#57)
|
2023-02-07 15:38:22 +01:00 |
OlivierDehaene
|
20c3c5940c
|
feat(router): refactor API and add openAPI schemas (#53)
|
2023-02-03 12:43:37 +01:00 |
OlivierDehaene
|
7b870e1e18
|
feat(router): use background task to manage request queue (#52)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2023-02-02 14:59:27 +01:00 |
OlivierDehaene
|
775115e3a5
|
feat(server): allow the server to use a local weight cache (#49)
|
2023-02-01 16:22:10 +01:00 |
OlivierDehaene
|
f830706b21
|
feat(server): Support GPT-Neox (#39)
|
2023-01-31 18:53:56 +01:00 |
OlivierDehaene
|
15511edc01
|
feat(server): Support SantaCoder (#26)
|
2023-01-20 12:24:39 +01:00 |
Nick Hill
|
e6d3eb5d5d
|
fix(server): Minor refactorization using new_zeros (#24)
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
|
2023-01-17 09:10:22 +01:00 |
OlivierDehaene
|
fcc2c5fcbf
|
feat(launcher): Log server stdout (#19)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2023-01-05 12:01:23 +01:00 |
OlivierDehaene
|
4236e41b0d
|
feat(server): Improved doc
|
2022-11-07 12:53:56 +01:00 |
OlivierDehaene
|
cea6051eff
|
feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard
|
2022-11-04 18:31:08 +01:00 |
OlivierDehaene
|
b3b7ea0d74
|
feat: Use json formatter by default in docker image
|
2022-11-02 17:29:56 +01:00 |
OlivierDehaene
|
3cf6368c77
|
feat(server): Support all AutoModelForCausalLM on a best effort basis
|
2022-10-28 19:24:00 +02:00 |
OlivierDehaene
|
09674e6df9
|
feat(server): Support bitsandbytes
|
2022-10-27 14:25:29 +02:00 |
Nicolas Patry
|
c8ce9b2515
|
feat(server): Use safetensors
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
|
2022-10-22 20:00:15 +02:00 |
OlivierDehaene
|
c837893370
|
feat(router): Add max_waiting_tokens
|
2022-10-21 16:40:05 +02:00 |
Olivier Dehaene
|
f16f2f5ae1
|
v0.1.0
|
2022-10-20 19:14:44 +02:00 |