OlivierDehaene
e3a63b6fbc
fix(launcher): revert change on shard errors ( #173 )
2023-04-13 11:07:11 +02:00
OlivierDehaene
6f0f1d70f6
v0.5.0 ( #168 )
2023-04-11 20:32:18 +02:00
OlivierDehaene
f26dfd0dc1
feat(server): support OPT models ( #55 )
...
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama ( #144 )
2023-04-11 16:38:22 +02:00
OlivierDehaene
e63a21eb4d
feat(launcher): allow disabling hf_transfer ( #161 )
2023-04-09 20:00:05 +02:00
OlivierDehaene
fef1a1c381
v0.4.3 ( #152 )
2023-03-30 17:28:14 +02:00
OlivierDehaene
84722f3e33
v0.4.2 ( #151 )
2023-03-30 17:10:01 +02:00
OlivierDehaene
ab5fd8cf93
v0.4.1 ( #140 )
2023-03-26 16:37:51 +02:00
OlivierDehaene
411d6247f4
v0.4.0 ( #119 )
2023-03-09 16:07:01 +01:00
OlivierDehaene
55bd4fed7d
feat(router): add best_of parameter ( #117 )
2023-03-09 15:30:54 +01:00
OlivierDehaene
5fd2dcb513
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible ( #108 )
2023-03-08 13:53:41 +01:00
OlivierDehaene
0ac38d336a
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES ( #107 )
2023-03-08 11:06:59 +01:00
OlivierDehaene
cd5961b5da
feat: allow local models ( #101 )
...
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene
9b205d33cc
fix(server): fix generate_stream by forcing tokens to be decoded correctly ( #100 )
2023-03-06 13:22:58 +01:00
OlivierDehaene
1c19b0934e
v0.3.2 ( #97 )
2023-03-03 18:42:20 +01:00
OlivierDehaene
240c4187fd
fix(launcher): add router parameters to launcher ( #95 )
2023-03-03 16:01:25 +01:00
OlivierDehaene
9b8ea6a6c7
feat(server): add logits watermark ( #90 )
2023-03-02 12:30:41 +01:00
OlivierDehaene
0ac184ce77
feat(server): add special token bool ( #85 )
2023-02-24 15:55:57 +01:00
OlivierDehaene
4b1c9720c0
v0.3.1 ( #84 )
2023-02-24 13:27:41 +01:00
OlivierDehaene
17bc841b1b
feat(server): enable hf-transfer ( #76 )
2023-02-18 14:04:11 +01:00
OlivierDehaene
6796d38c6d
feat(router): add cors allow origin options ( #73 )
2023-02-17 18:22:00 +01:00
OlivierDehaene
c720555adc
v0.3.0 ( #72 )
2023-02-16 17:28:29 +01:00
OlivierDehaene
7b3d460d21
fix(launcher): copy current env vars to subprocesses ( #70 )
...
closes #69
2023-02-16 11:20:23 +01:00
OlivierDehaene
68455353f5
feat(launcher): add disable_custom_kernels arg ( #67 )
2023-02-15 16:23:45 +01:00
OlivierDehaene
c5a4a1faf3
feat(server): improve download logging ( #66 )
2023-02-15 16:11:32 +01:00
OlivierDehaene
0fbc691946
feat: add safetensors conversion ( #63 )
2023-02-14 13:02:16 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
OlivierDehaene
1ad3250b89
fix(docker): increase shm size ( #60 )
2023-02-08 17:53:33 +01:00
OlivierDehaene
2fe5e1b30e
V0.2.1 ( #58 )
2023-02-07 15:40:25 +01:00
OlivierDehaene
4acc42a605
fix(server): better handling of inference mode ( #57 )
2023-02-07 15:38:22 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas ( #53 )
2023-02-03 12:43:37 +01:00
OlivierDehaene
b1482d9048
breaking(router): modify /generate API to only return generated text ( #50 )
...
@njhill, @yk FYI
generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.
We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene
7b870e1e18
feat(router): use background task to manage request queue ( #52 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-02-02 14:59:27 +01:00
OlivierDehaene
775115e3a5
feat(server): allow the server to use a local weight cache ( #49 )
2023-02-01 16:22:10 +01:00
OlivierDehaene
f830706b21
feat(server): Support GPT-Neox ( #39 )
2023-01-31 18:53:56 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support ( #41 )
2023-01-31 17:04:00 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" ( #40 )
...
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support ( #36 )
...
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder ( #26 )
2023-01-20 12:24:39 +01:00
Nick Hill
e6d3eb5d5d
fix(server): Minor refactorization using new_zeros ( #24 )
...
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene
fcc2c5fcbf
feat(launcher): Log server stdout ( #19 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-01-05 12:01:23 +01:00
OlivierDehaene
611e21cb13
fix(server): Fix stop sequences ( #11 )
2022-12-16 16:03:39 +01:00
OlivierDehaene
3e2e6240b8
feat(launcher): Add integration tests ( #9 )
2022-12-16 11:29:36 +01:00
OlivierDehaene
4236e41b0d
feat(server): Improved doc
2022-11-07 12:53:56 +01:00
OlivierDehaene
cea6051eff
feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard
2022-11-04 18:31:08 +01:00
OlivierDehaene
b3b7ea0d74
feat: Use json formatter by default in docker image
2022-11-02 17:29:56 +01:00
OlivierDehaene
3cf6368c77
feat(server): Support all AutoModelForCausalLM on a best effort basis
2022-10-28 19:24:00 +02:00
OlivierDehaene
09674e6df9
feat(server): Support bitsandbytes
2022-10-27 14:25:29 +02:00
Nicolas Patry
c8ce9b2515
feat(server): Use safetensors
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2022-10-22 20:00:15 +02:00
OlivierDehaene
c837893370
feat(router): Add max_waiting_tokens
2022-10-21 16:40:05 +02:00