OlivierDehaene
cd5961b5da
feat: allow local models ( #101 )
...
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene
9b8ea6a6c7
feat(server): add logits watermark ( #90 )
2023-03-02 12:30:41 +01:00
OlivierDehaene
44ce098c10
feat(server): pre-allocate max attention mask ( #75 )
2023-02-24 12:49:21 +01:00
OlivierDehaene
78063c0569
fix(server): remove position_ids from galactica forward ( #82 )
...
closes #80
2023-02-20 19:28:57 +01:00
OlivierDehaene
0fbc691946
feat: add safetensors conversion ( #63 )
2023-02-14 13:02:16 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas ( #53 )
2023-02-03 12:43:37 +01:00
OlivierDehaene
f830706b21
feat(server): Support GPT-Neox ( #39 )
2023-01-31 18:53:56 +01:00
OlivierDehaene
c6e8b9442b
fix(server): fix quantization for sharded models ( #45 )
2023-01-31 17:40:38 +01:00
OlivierDehaene
54fec93193
fix(server): fix seeding with multiple shards ( #44 )
2023-01-31 16:01:15 +01:00
OlivierDehaene
03bdf18290
fix(server): fix seeding on gpu ( #42 )
2023-01-31 14:30:33 +01:00
OlivierDehaene
cd298bc5e5
feat: Support sampling seeding ( #37 )
...
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
2023-01-30 15:36:16 +01:00
OlivierDehaene
1f570d181f
fix(server): Fix position ids ( #28 )
2023-01-20 15:35:22 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder ( #26 )
2023-01-20 12:24:39 +01:00
Nick Hill
e6d3eb5d5d
fix(server): Minor refactorization using new_zeros ( #24 )
...
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene
32a253063d
feat: Return logprobs ( #8 )
2022-12-15 17:03:56 +01:00
OlivierDehaene
718096f695
feat: Support stop sequences ( #7 )
2022-12-12 18:25:22 +01:00
OlivierDehaene
a2985036aa
feat(server): Add model tests ( #6 )
2022-12-08 18:49:33 +01:00
Nick Hill
31d76e238d
fix(batching): Avoid theoretical hang in batcher loop ( #5 )
...
- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute
Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>
2022-12-05 10:10:59 +01:00
OlivierDehaene
daa1d81d5e
feat(server): Support Galactica ( #4 )
2022-12-01 19:31:54 +01:00