OlivierDehaene
|
50b495f3d8
|
feat: add more latency metrics in forward (#1346)
|
2023-12-14 15:59:38 +01:00 |
OlivierDehaene
|
72ee382ded
|
chore: formatting
|
2023-12-11 14:49:52 +01:00 |
Nicolas Patry
|
9ecfa16b12
|
Speculative (#1308)
|
2023-12-11 12:46:30 +01:00 |
OlivierDehaene
|
895c5f1562
|
feat(server): only compute prefill logprobs when asked (#406)
Close #288
|
2023-06-02 17:12:30 +02:00 |
OlivierDehaene
|
62f91f78ac
|
feat(server): support vectorized warpers in flash causal lm (#317)
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
|
2023-05-26 12:30:27 +02:00 |
OlivierDehaene
|
218c9adaa5
|
feat: decrease IPC proto size (#367)
Closes #307 #308
|
2023-05-24 19:19:57 +02:00 |
OlivierDehaene
|
5a58226130
|
fix(server): fix decode token (#334)
Fixes #333
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
|
2023-05-16 23:23:27 +02:00 |
Nick Hill
|
4a7dd4085a
|
feat(server): reduce memory requirement (#214)
|
2023-04-24 14:15:42 +02:00 |
OlivierDehaene
|
709d8936f6
|
feat(router): drop requests when client closes the channel (#202)
|
2023-04-20 11:07:40 +02:00 |
OlivierDehaene
|
299217c95c
|
feat(server): add flash attention llama (#144)
|
2023-04-11 16:38:22 +02:00 |
OlivierDehaene
|
9987960062
|
feat(router): make router input validation optional (#164)
|
2023-04-09 20:22:27 +02:00 |
OlivierDehaene
|
b49dbf2d88
|
fix(server): use server tokenizer as gt (#128)
|
2023-03-16 12:12:26 +01:00 |
OlivierDehaene
|
3fef90d50f
|
feat(clients): Python client (#103)
|
2023-03-07 18:52:22 +01:00 |
OlivierDehaene
|
9b205d33cc
|
fix(server): fix generate_stream by forcing tokens to be decoded correctly (#100)
|
2023-03-06 13:22:58 +01:00 |
OlivierDehaene
|
44ce098c10
|
feat(server): pre-allocate max attention mask (#75)
|
2023-02-24 12:49:21 +01:00 |
OlivierDehaene
|
017a2a8c2f
|
feat: Add token streaming using ServerSideEvents support (#41)
|
2023-01-31 17:04:00 +01:00 |
OlivierDehaene
|
4f9ac67cfa
|
Revert "feat: Add token streaming using ServerSideEvents support" (#40)
Reverts huggingface/text-generation-inference#36
|
2023-01-31 14:21:51 +01:00 |
OlivierDehaene
|
7fbfbb0dc5
|
feat: Add token streaming using ServerSideEvents support (#36)
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
|
2023-01-31 11:49:43 +01:00 |
OlivierDehaene
|
15511edc01
|
feat(server): Support SantaCoder (#26)
|
2023-01-20 12:24:39 +01:00 |
OlivierDehaene
|
32a253063d
|
feat: Return logprobs (#8)
|
2022-12-15 17:03:56 +01:00 |
OlivierDehaene
|
718096f695
|
feat: Support stop sequences (#7)
|
2022-12-12 18:25:22 +01:00 |
OlivierDehaene
|
a2985036aa
|
feat(server): Add model tests (#6)
|
2022-12-08 18:49:33 +01:00 |