Commit Graph

449 Commits

Author SHA1 Message Date
OlivierDehaene c503a639b1
feat(server): support t5 (#59) 2023-02-07 18:25:17 +01:00
OlivierDehaene 2fe5e1b30e
V0.2.1 (#58) 2023-02-07 15:40:25 +01:00
OlivierDehaene 4acc42a605
fix(server): better handling of inference mode (#57) 2023-02-07 15:38:22 +01:00
OlivierDehaene 20c3c5940c
feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00
OlivierDehaene b1482d9048
breaking(router): modify /generate API to only return generated text (#50)
@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene df227ac20d
fix(server): allow greedy repetition penalty (#51) 2023-02-02 10:34:35 +01:00
OlivierDehaene 775115e3a5
feat(server): allow the server to use a local weight cache (#49) 2023-02-01 16:22:10 +01:00
OlivierDehaene 313194f6d7
feat(server): support repetition penalty (#47) 2023-02-01 15:58:42 +01:00
OlivierDehaene 2ad895a6cc
feat(server): allow gpt-neox models with odd vocab sizes to be sharded (#48) 2023-02-01 14:43:59 +01:00
OlivierDehaene f830706b21
feat(server): Support GPT-Neox (#39) 2023-01-31 18:53:56 +01:00
OlivierDehaene c6e8b9442b
fix(server): fix quantization for sharded models (#45) 2023-01-31 17:40:38 +01:00
OlivierDehaene 017a2a8c2f
feat: Add token streaming using ServerSideEvents support (#41) 2023-01-31 17:04:00 +01:00
OlivierDehaene 54fec93193
fix(server): fix seeding with multiple shards (#44) 2023-01-31 16:01:15 +01:00
OlivierDehaene 03bdf18290
fix(server): fix seeding on gpu (#42) 2023-01-31 14:30:33 +01:00
OlivierDehaene 4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" (#40)
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene 7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support (#36)
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
struct Details {
    finish_reason: String,
    generated_tokens: u32,
    seed: Option<u64>,
}

struct StreamResponse {
    token: Token,
    generated_text: Option<String>,
    details: Option<Details>,
}

struct ErrorResponse {
    error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene cd298bc5e5
feat: Support sampling seeding (#37)
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
2023-01-30 15:36:16 +01:00
OlivierDehaene ce960be0a5
feat(bloom): use torch.nn.Linear and torch.nn.GELU (#33) 2023-01-26 15:33:45 +01:00
OlivierDehaene 13e7044ab7
fix(dockerfile): fix docker build (#32) 2023-01-24 19:52:39 +01:00
OlivierDehaene 1f570d181f
fix(server): Fix position ids (#28) 2023-01-20 15:35:22 +01:00
OlivierDehaene 15511edc01
feat(server): Support SantaCoder (#26) 2023-01-20 12:24:39 +01:00
Nick Hill e6d3eb5d5d
fix(server): Minor refactorization using new_zeros (#24)
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene fcc2c5fcbf
feat(launcher): Log server stdout (#19)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-01-05 12:01:23 +01:00
Nicolas Patry b94f30215f
fix(server): Use cleanup_tokenization_spaces=False for lossless decoding (#13)
Fixes #12 in the easiest way I could think of.
2023-01-03 11:07:05 +01:00
Nick Hill 686cc66717
fix(server): Check for device type correctly when determining initial padding (#16)
AFAIK there is no torch device type called "gpu".
2022-12-30 19:30:42 +01:00
OlivierDehaene 611e21cb13
fix(server): Fix stop sequences (#11) 2022-12-16 16:03:39 +01:00
OlivierDehaene 32a253063d
feat: Return logprobs (#8) 2022-12-15 17:03:56 +01:00
OlivierDehaene 718096f695
feat: Support stop sequences (#7) 2022-12-12 18:25:22 +01:00
OlivierDehaene 042180d88f fix(server): Only pad to multiple of 8 on GPUs 2022-12-08 19:37:37 +01:00
OlivierDehaene a2985036aa
feat(server): Add model tests (#6) 2022-12-08 18:49:33 +01:00
Nick Hill 31d76e238d
fix(batching): Avoid theoretical hang in batcher loop (#5)
- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute

Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>
2022-12-05 10:10:59 +01:00
OlivierDehaene daa1d81d5e
feat(server): Support Galactica (#4) 2022-12-01 19:31:54 +01:00
OlivierDehaene dccd5c2b1a feat(server): Clarify CausalLMBatch concatenate method 2022-11-09 18:24:07 +01:00
OlivierDehaene fa43fb71be fix(server): Fix Transformers fork version 2022-11-08 17:42:38 +01:00
OlivierDehaene 4236e41b0d feat(server): Improved doc 2022-11-07 12:53:56 +01:00
OlivierDehaene 427d7cc444 feat(server): Support AutoModelForSeq2SeqLM 2022-11-04 18:03:04 +01:00
OlivierDehaene c5665f5c8b feat(server): Support generic AutoModelForCausalLM 2022-11-04 14:22:47 +01:00
OlivierDehaene 755fc0e403 fix(models): Revert buggy support for AutoModel 2022-11-03 16:07:54 +01:00
OlivierDehaene b3b7ea0d74 feat: Use json formatter by default in docker image 2022-11-02 17:29:56 +01:00
OlivierDehaene 3cf6368c77 feat(server): Support all AutoModelForCausalLM on a best effort basis 2022-10-28 19:24:00 +02:00
OlivierDehaene 09674e6df9 feat(server): Support bitsandbytes 2022-10-27 14:25:29 +02:00
Nicolas Patry c8ce9b2515
feat(server): Use safetensors
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2022-10-22 20:00:15 +02:00
Olivier Dehaene f16f2f5ae1 v0.1.0 2022-10-20 19:14:44 +02:00
Olivier Dehaene 5e5d8766a2 feat: Improve error handling 2022-10-17 14:59:00 +02:00
Olivier Dehaene bcb53903b8 feat: Add AML deployment 2022-10-15 20:21:50 +02:00
Olivier Dehaene bf99afe916 feat: Docker image 2022-10-14 15:56:21 +02:00
Olivier Dehaene 4c693e6524 Refactored gRPC interface
Added validation logic
2022-10-11 16:50:54 +02:00
Olivier Dehaene 1d986983d5 fix: cleanup 2022-10-08 12:34:25 +02:00
Olivier Dehaene 295831a481 Init 2022-10-08 12:30:12 +02:00