OlivierDehaene
9b205d33cc
fix(server): fix generate_stream by forcing tokens to be decoded correctly ( #100 )
2023-03-06 13:22:58 +01:00
OlivierDehaene
1c19b0934e
v0.3.2 ( #97 )
2023-03-03 18:42:20 +01:00
OlivierDehaene
0b6807caa4
feat(server): fix transformers commit ( #96 )
2023-03-03 17:56:27 +01:00
OlivierDehaene
240c4187fd
fix(launcher): add router parameters to launcher ( #95 )
2023-03-03 16:01:25 +01:00
OlivierDehaene
e3ded361b2
feat(ci): improve CI speed ( #94 )
2023-03-03 15:07:27 +01:00
OlivierDehaene
2d39f199ae
feat(server): update to hf_transfer==0.1.2 ( #93 )
2023-03-03 11:26:27 +01:00
OlivierDehaene
9b8ea6a6c7
feat(server): add logits watermark ( #90 )
2023-03-02 12:30:41 +01:00
OlivierDehaene
f874c47831
feat(router): add api-inference headers ( #91 )
2023-03-02 11:41:51 +01:00
OlivierDehaene
4e685d907e
feat(router): ask hf.co for pipelinetag to decide on compat_return_full_text ( #89 )
2023-02-28 10:19:32 +01:00
OlivierDehaene
21340f24ba
feat(router): add legacy route for api-inference support ( #88 )
2023-02-27 14:56:58 +01:00
OlivierDehaene
65e2f1624e
fix(server): fix token_is_special ( #87 )
2023-02-24 17:20:00 +01:00
OlivierDehaene
3b03c4ea18
fix(docs): fix openapi schema ( #86 )
2023-02-24 15:59:49 +01:00
OlivierDehaene
0ac184ce77
feat(server): add special token bool ( #85 )
2023-02-24 15:55:57 +01:00
OlivierDehaene
4b1c9720c0
v0.3.1 ( #84 )
2023-02-24 13:27:41 +01:00
OlivierDehaene
44ce098c10
feat(server): pre-allocate max attention mask ( #75 )
2023-02-24 12:49:21 +01:00
OlivierDehaene
78063c0569
fix(server): remove position_ids from galactica forward ( #82 )
...
closes #80
2023-02-20 19:28:57 +01:00
OlivierDehaene
17bc841b1b
feat(server): enable hf-transfer ( #76 )
2023-02-18 14:04:11 +01:00
OlivierDehaene
6796d38c6d
feat(router): add cors allow origin options ( #73 )
2023-02-17 18:22:00 +01:00
OlivierDehaene
c720555adc
v0.3.0 ( #72 )
2023-02-16 17:28:29 +01:00
OlivierDehaene
439fcaf810
feat(router): add prometheus metrics scrape endpoint ( #71 )
2023-02-16 17:18:53 +01:00
OlivierDehaene
7b3d460d21
fix(launcher): copy current env vars to subprocesses ( #70 )
...
closes #69
2023-02-16 11:20:23 +01:00
OlivierDehaene
5437d49beb
feat(router): add max_total_tokens and empty_input validation ( #68 )
...
closes #65
2023-02-15 21:56:59 +01:00
OlivierDehaene
68455353f5
feat(launcher): add disable_custom_kernels arg ( #67 )
2023-02-15 16:23:45 +01:00
OlivierDehaene
c5a4a1faf3
feat(server): improve download logging ( #66 )
2023-02-15 16:11:32 +01:00
OlivierDehaene
0fbc691946
feat: add safetensors conversion ( #63 )
2023-02-14 13:02:16 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
Yannic Kilcher
e520d5b349
fixed SSE naming ( #61 )
...
https://en.wikipedia.org/wiki/Server-sent_events
2023-02-08 22:30:11 +01:00
OlivierDehaene
1ad3250b89
fix(docker): increase shm size ( #60 )
2023-02-08 17:53:33 +01:00
OlivierDehaene
c503a639b1
feat(server): support t5 ( #59 )
2023-02-07 18:25:17 +01:00
OlivierDehaene
2fe5e1b30e
V0.2.1 ( #58 )
2023-02-07 15:40:25 +01:00
OlivierDehaene
4acc42a605
fix(server): better handling of inference mode ( #57 )
2023-02-07 15:38:22 +01:00
OlivierDehaene
e114d87486
feat(ci): push to AML registry ( #56 )
2023-02-06 14:33:56 +01:00
lewtun
a0dca443dd
feat(docs): Clarify installation steps ( #54 )
...
Adds some bits for first-time users (like me 😄 )
2023-02-03 13:07:55 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas ( #53 )
2023-02-03 12:43:37 +01:00
OlivierDehaene
b1482d9048
breaking(router): modify /generate API to only return generated text ( #50 )
...
@njhill, @yk FYI
generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.
We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene
7b870e1e18
feat(router): use background task to manage request queue ( #52 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-02-02 14:59:27 +01:00
OlivierDehaene
df227ac20d
fix(server): allow greedy repetition penalty ( #51 )
2023-02-02 10:34:35 +01:00
OlivierDehaene
775115e3a5
feat(server): allow the server to use a local weight cache ( #49 )
2023-02-01 16:22:10 +01:00
OlivierDehaene
313194f6d7
feat(server): support repetition penalty ( #47 )
2023-02-01 15:58:42 +01:00
OlivierDehaene
2ad895a6cc
feat(server): allow gpt-neox models with odd vocab sizes to be sharded ( #48 )
2023-02-01 14:43:59 +01:00
OlivierDehaene
404ed7a1f6
feat(ci): Docker build and push ( #46 )
2023-01-31 20:14:05 +01:00
OlivierDehaene
f830706b21
feat(server): Support GPT-Neox ( #39 )
2023-01-31 18:53:56 +01:00
OlivierDehaene
c6e8b9442b
fix(server): fix quantization for sharded models ( #45 )
2023-01-31 17:40:38 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support ( #41 )
2023-01-31 17:04:00 +01:00
OlivierDehaene
54fec93193
fix(server): fix seeding with multiple shards ( #44 )
2023-01-31 16:01:15 +01:00
OlivierDehaene
03bdf18290
fix(server): fix seeding on gpu ( #42 )
2023-01-31 14:30:33 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" ( #40 )
...
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support ( #36 )
...
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
cd298bc5e5
feat: Support sampling seeding ( #37 )
...
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
2023-01-30 15:36:16 +01:00
OlivierDehaene
1539d3cbbe
feat(router): Remove second lock from batcher hot path ( #27 )
...
@njhill
2023-01-26 16:29:13 +01:00