Commit Graph

22 Commits

Author SHA1 Message Date
OlivierDehaene f830706b21
feat(server): Support GPT-Neox (#39) 2023-01-31 18:53:56 +01:00
OlivierDehaene 017a2a8c2f
feat: Add token streaming using ServerSideEvents support (#41) 2023-01-31 17:04:00 +01:00
OlivierDehaene 03bdf18290
fix(server): fix seeding on gpu (#42) 2023-01-31 14:30:33 +01:00
OlivierDehaene 4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" (#40)
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene 7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support (#36)
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
struct Details {
    finish_reason: String,
    generated_tokens: u32,
    seed: Option<u64>,
}

struct StreamResponse {
    token: Token,
    generated_text: Option<String>,
    details: Option<Details>,
}

struct ErrorResponse {
    error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene cd298bc5e5
feat: Support sampling seeding (#37)
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
2023-01-30 15:36:16 +01:00
OlivierDehaene 1f570d181f
fix(server): Fix position ids (#28) 2023-01-20 15:35:22 +01:00
OlivierDehaene 15511edc01
feat(server): Support SantaCoder (#26) 2023-01-20 12:24:39 +01:00
Nick Hill e6d3eb5d5d
fix(server): Minor refactorization using new_zeros (#24)
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
Nicolas Patry b94f30215f
fix(server): Use cleanup_tokenization_spaces=False for lossless decoding (#13)
Fixes #12 in the easiest way I could think of.
2023-01-03 11:07:05 +01:00
Nick Hill 686cc66717
fix(server): Check for device type correctly when determining initial padding (#16)
AFAIK there is no torch device type called "gpu".
2022-12-30 19:30:42 +01:00
OlivierDehaene 611e21cb13
fix(server): Fix stop sequences (#11) 2022-12-16 16:03:39 +01:00
OlivierDehaene 32a253063d
feat: Return logprobs (#8) 2022-12-15 17:03:56 +01:00
OlivierDehaene 718096f695
feat: Support stop sequences (#7) 2022-12-12 18:25:22 +01:00
OlivierDehaene 042180d88f fix(server): Only pad to multiple of 8 on GPUs 2022-12-08 19:37:37 +01:00
OlivierDehaene a2985036aa
feat(server): Add model tests (#6) 2022-12-08 18:49:33 +01:00
Nick Hill 31d76e238d
fix(batching): Avoid theoretical hang in batcher loop (#5)
- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute

Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>
2022-12-05 10:10:59 +01:00
OlivierDehaene daa1d81d5e
feat(server): Support Galactica (#4) 2022-12-01 19:31:54 +01:00
OlivierDehaene dccd5c2b1a feat(server): Clarify CausalLMBatch concatenate method 2022-11-09 18:24:07 +01:00
OlivierDehaene 4236e41b0d feat(server): Improved doc 2022-11-07 12:53:56 +01:00
OlivierDehaene 427d7cc444 feat(server): Support AutoModelForSeq2SeqLM 2022-11-04 18:03:04 +01:00
OlivierDehaene c5665f5c8b feat(server): Support generic AutoModelForCausalLM 2022-11-04 14:22:47 +01:00