Daniël de Kok
bf3c813782
server: use chunked inputs
...
The router will now send the input as chunks besides as a single
string. This change modifies the server to process chunked input
rather than strings. This also allows us to remove the image
extraction code from the server.
2024-06-07 08:09:04 +02:00
OlivierDehaene
50b495f3d8
feat: add more latency metrics in forward ( #1346 )
2023-12-14 15:59:38 +01:00
OlivierDehaene
72ee382ded
chore: formatting
2023-12-11 14:49:52 +01:00
Nicolas Patry
9ecfa16b12
Speculative ( #1308 )
2023-12-11 12:46:30 +01:00
OlivierDehaene
895c5f1562
feat(server): only compute prefill logprobs when asked ( #406 )
...
Close #288
2023-06-02 17:12:30 +02:00
OlivierDehaene
62f91f78ac
feat(server): support vectorized warpers in flash causal lm ( #317 )
...
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
2023-05-26 12:30:27 +02:00
OlivierDehaene
218c9adaa5
feat: decrease IPC proto size ( #367 )
...
Closes #307 #308
2023-05-24 19:19:57 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ( #226 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
Nick Hill
4a7dd4085a
feat(server): reduce memory requirement ( #214 )
2023-04-24 14:15:42 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel ( #202 )
2023-04-20 11:07:40 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional ( #164 )
2023-04-09 20:22:27 +02:00
OlivierDehaene
b49dbf2d88
fix(server): use server tokenizer as gt ( #128 )
2023-03-16 12:12:26 +01:00
OlivierDehaene
3fef90d50f
feat(clients): Python client ( #103 )
2023-03-07 18:52:22 +01:00
OlivierDehaene
44ce098c10
feat(server): pre-allocate max attention mask ( #75 )
2023-02-24 12:49:21 +01:00
OlivierDehaene
b1482d9048
breaking(router): modify /generate API to only return generated text ( #50 )
...
@njhill, @yk FYI
generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.
We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support ( #41 )
2023-01-31 17:04:00 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" ( #40 )
...
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support ( #36 )
...
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder ( #26 )
2023-01-20 12:24:39 +01:00
OlivierDehaene
32a253063d
feat: Return logprobs ( #8 )
2022-12-15 17:03:56 +01:00
OlivierDehaene
718096f695
feat: Support stop sequences ( #7 )
2022-12-12 18:25:22 +01:00
OlivierDehaene
a2985036aa
feat(server): Add model tests ( #6 )
2022-12-08 18:49:33 +01:00