Commit Graph

  • 0b6807caa4
    feat(server): fix transformers commit (#96) OlivierDehaene 2023-03-03 17:56:27 +0100
  • 240c4187fd
    fix(launcher): add router parameters to launcher (#95) OlivierDehaene 2023-03-03 16:01:25 +0100
  • e3ded361b2
    feat(ci): improve CI speed (#94) OlivierDehaene 2023-03-03 15:07:27 +0100
  • 2d39f199ae
    feat(server): update to hf_transfer==0.1.2 (#93) OlivierDehaene 2023-03-03 11:26:27 +0100
  • 9b8ea6a6c7
    feat(server): add logits watermark (#90) OlivierDehaene 2023-03-02 12:30:41 +0100
  • f874c47831
    feat(router): add api-inference headers (#91) OlivierDehaene 2023-03-02 11:41:51 +0100
  • 4e685d907e
    feat(router): ask hf.co for pipelinetag to decide on compat_return_full_text (#89) OlivierDehaene 2023-02-28 10:19:32 +0100
  • 21340f24ba
    feat(router): add legacy route for api-inference support (#88) OlivierDehaene 2023-02-27 14:56:58 +0100
  • 65e2f1624e
    fix(server): fix token_is_special (#87) OlivierDehaene 2023-02-24 17:20:00 +0100
  • 3b03c4ea18
    fix(docs): fix openapi schema (#86) OlivierDehaene 2023-02-24 15:59:49 +0100
  • 0ac184ce77
    feat(server): add special token bool (#85) OlivierDehaene 2023-02-24 15:55:57 +0100
  • 4b1c9720c0
    v0.3.1 (#84) OlivierDehaene 2023-02-24 13:27:41 +0100
  • 44ce098c10
    feat(server): pre-allocate max attention mask (#75) OlivierDehaene 2023-02-24 12:49:21 +0100
  • 78063c0569
    fix(server): remove position_ids from galactica forward (#82) OlivierDehaene 2023-02-20 19:28:57 +0100
  • 17bc841b1b
    feat(server): enable hf-transfer (#76) OlivierDehaene 2023-02-18 14:04:11 +0100
  • 6796d38c6d
    feat(router): add cors allow origin options (#73) OlivierDehaene 2023-02-17 18:22:00 +0100
  • c720555adc
    v0.3.0 (#72) OlivierDehaene 2023-02-16 17:28:29 +0100
  • 439fcaf810
    feat(router): add prometheus metrics scrape endpoint (#71) OlivierDehaene 2023-02-16 17:18:53 +0100
  • 7b3d460d21
    fix(launcher): copy current env vars to subprocesses (#70) OlivierDehaene 2023-02-16 11:20:23 +0100
  • 5437d49beb
    feat(router): add max_total_tokens and empty_input validation (#68) OlivierDehaene 2023-02-15 21:56:59 +0100
  • 68455353f5
    feat(launcher): add disable_custom_kernels arg (#67) OlivierDehaene 2023-02-15 16:23:45 +0100
  • c5a4a1faf3
    feat(server): improve download logging (#66) OlivierDehaene 2023-02-15 16:11:32 +0100
  • 0fbc691946
    feat: add safetensors conversion (#63) OlivierDehaene 2023-02-14 13:02:16 +0100
  • 9af454142a
    feat: add distributed tracing (#62) OlivierDehaene 2023-02-13 13:02:45 +0100
  • e520d5b349
    fixed SSE naming (#61) Yannic Kilcher 2023-02-08 22:30:11 +0100
  • 1ad3250b89
    fix(docker): increase shm size (#60) OlivierDehaene 2023-02-08 17:53:33 +0100
  • c503a639b1
    feat(server): support t5 (#59) OlivierDehaene 2023-02-07 18:25:17 +0100
  • 2fe5e1b30e
    V0.2.1 (#58) OlivierDehaene 2023-02-07 15:40:25 +0100
  • 4acc42a605
    fix(server): better handling of inference mode (#57) OlivierDehaene 2023-02-07 15:38:22 +0100
  • e114d87486
    feat(ci): push to AML registry (#56) OlivierDehaene 2023-02-06 14:33:56 +0100
  • a0dca443dd
    feat(docs): Clarify installation steps (#54) lewtun 2023-02-03 13:07:55 +0100
  • 20c3c5940c
    feat(router): refactor API and add openAPI schemas (#53) OlivierDehaene 2023-02-03 12:43:37 +0100
  • b1482d9048
    breaking(router): modify /generate API to only return generated text (#50) OlivierDehaene 2023-02-02 15:02:04 +0100
  • 7b870e1e18
    feat(router): use background task to manage request queue (#52) OlivierDehaene 2023-02-02 14:59:27 +0100
  • df227ac20d
    fix(server): allow greedy repetition penalty (#51) OlivierDehaene 2023-02-02 10:34:35 +0100
  • 775115e3a5
    feat(server): allow the server to use a local weight cache (#49) OlivierDehaene 2023-02-01 16:22:10 +0100
  • 313194f6d7
    feat(server): support repetition penalty (#47) OlivierDehaene 2023-02-01 15:58:42 +0100
  • 2ad895a6cc
    feat(server): allow gpt-neox models with odd vocab sizes to be sharded (#48) OlivierDehaene 2023-02-01 14:43:59 +0100
  • 404ed7a1f6
    feat(ci): Docker build and push (#46) OlivierDehaene 2023-01-31 20:14:05 +0100
  • f830706b21
    feat(server): Support GPT-Neox (#39) OlivierDehaene 2023-01-31 18:53:56 +0100
  • c6e8b9442b
    fix(server): fix quantization for sharded models (#45) OlivierDehaene 2023-01-31 17:40:38 +0100
  • 017a2a8c2f
    feat: Add token streaming using ServerSideEvents support (#41) OlivierDehaene 2023-01-31 17:04:00 +0100
  • 54fec93193
    fix(server): fix seeding with multiple shards (#44) OlivierDehaene 2023-01-31 16:01:15 +0100
  • 03bdf18290
    fix(server): fix seeding on gpu (#42) OlivierDehaene 2023-01-31 14:30:33 +0100
  • 4f9ac67cfa
    Revert "feat: Add token streaming using ServerSideEvents support" (#40) OlivierDehaene 2023-01-31 14:21:51 +0100
  • 7fbfbb0dc5
    feat: Add token streaming using ServerSideEvents support (#36) OlivierDehaene 2023-01-31 11:49:43 +0100
  • cd298bc5e5
    feat: Support sampling seeding (#37) OlivierDehaene 2023-01-30 15:36:16 +0100
  • 1539d3cbbe
    feat(router): Remove second lock from batcher hot path (#27) OlivierDehaene 2023-01-26 16:29:13 +0100
  • ce960be0a5
    feat(bloom): use torch.nn.Linear and torch.nn.GELU (#33) OlivierDehaene 2023-01-26 15:33:45 +0100
  • 13e7044ab7
    fix(dockerfile): fix docker build (#32) OlivierDehaene 2023-01-24 19:52:39 +0100
  • 5c01e2544c
    fix(router): fix api-inference deployment (#31) OlivierDehaene 2023-01-23 17:42:14 +0100
  • ab2ad91da3
    fix(docker): fix api-inference deployment (#30) OlivierDehaene 2023-01-23 17:33:08 +0100
  • f9d0ec376a
    feat(docker): Make the image compatible with api-inference (#29) OlivierDehaene 2023-01-23 17:11:27 +0100
  • 1f570d181f
    fix(server): Fix position ids (#28) OlivierDehaene 2023-01-20 15:35:22 +0100
  • 15511edc01
    feat(server): Support SantaCoder (#26) OlivierDehaene 2023-01-20 12:24:39 +0100
  • f7ac394935
    fix(router): Obey max batch size (#23) Nick Hill 2023-01-17 00:11:21 -0800
  • e6d3eb5d5d
    fix(server): Minor refactorization using new_zeros (#24) Nick Hill 2023-01-17 00:10:22 -0800
  • fcc2c5fcbf
    feat(launcher): Log server stdout (#19) OlivierDehaene 2023-01-05 12:01:23 +0100
  • b94f30215f
    fix(server): Use cleanup_tokenization_spaces=False for lossless decoding (#13) Nicolas Patry 2023-01-03 11:07:05 +0100
  • 60472f9d2b
    feat(router): Add const parameters to validation logic (#15) Nick Hill 2023-01-03 01:41:22 -0800
  • 3efa5bbbfd
    fix(router): Include special tokens when tokenizing (#14) Nick Hill 2022-12-30 10:31:44 -0800
  • 686cc66717
    fix(server): Check for device type correctly when determining initial padding (#16) Nick Hill 2022-12-30 10:30:42 -0800
  • 611e21cb13
    fix(server): Fix stop sequences (#11) OlivierDehaene 2022-12-16 16:03:39 +0100
  • 3e2e6240b8
    feat(launcher): Add integration tests (#9) OlivierDehaene 2022-12-16 11:29:36 +0100
  • 32a253063d
    feat: Return logprobs (#8) OlivierDehaene 2022-12-15 17:03:56 +0100
  • 718096f695
    feat: Support stop sequences (#7) OlivierDehaene 2022-12-12 18:25:22 +0100
  • 042180d88f fix(server): Only pad to multiple of 8 on GPUs OlivierDehaene 2022-12-08 19:37:37 +0100
  • a2985036aa
    feat(server): Add model tests (#6) OlivierDehaene 2022-12-08 18:49:33 +0100
  • 31d76e238d
    fix(batching): Avoid theoretical hang in batcher loop (#5) Nick Hill 2022-12-05 01:10:59 -0800
  • daa1d81d5e
    feat(server): Support Galactica (#4) OlivierDehaene 2022-12-01 19:31:54 +0100
  • d6d5b12e03 fix(router): Handle tokenizer errors OlivierDehaene 2022-11-14 17:15:19 +0100
  • feb7806ca4 fix(readme): Typo OlivierDehaene 2022-11-14 16:22:10 +0100
  • 91f5f86280 fix(router): Fix HTTP status codes OlivierDehaene 2022-11-14 14:34:15 +0100
  • 6c781025ae feat(rust): Update to 1.65 OlivierDehaene 2022-11-14 13:59:56 +0100
  • dccd5c2b1a feat(server): Clarify CausalLMBatch concatenate method OlivierDehaene 2022-11-09 18:24:07 +0100
  • fa43fb71be fix(server): Fix Transformers fork version OlivierDehaene 2022-11-08 17:42:38 +0100
  • 4236e41b0d feat(server): Improved doc OlivierDehaene 2022-11-07 12:53:56 +0100
  • cea6051eff feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard OlivierDehaene 2022-11-04 18:31:08 +0100
  • 427d7cc444 feat(server): Support AutoModelForSeq2SeqLM OlivierDehaene 2022-11-04 18:03:04 +0100
  • c5665f5c8b feat(server): Support generic AutoModelForCausalLM OlivierDehaene 2022-11-04 14:22:47 +0100
  • 755fc0e403 fix(models): Revert buggy support for AutoModel OlivierDehaene 2022-11-03 16:07:54 +0100
  • b3b7ea0d74 feat: Use json formatter by default in docker image OlivierDehaene 2022-11-02 17:29:56 +0100
  • 3cf6368c77 feat(server): Support all AutoModelForCausalLM on a best effort basis OlivierDehaene 2022-10-28 19:24:00 +0200
  • 09674e6df9 feat(server): Support bitsandbytes OlivierDehaene 2022-10-27 14:25:29 +0200
  • beb552127a feat(client): Simplify sharded logic OlivierDehaene 2022-10-22 23:40:05 +0200
  • c8ce9b2515
    feat(server): Use safetensors Nicolas Patry 2022-10-22 20:00:15 +0200
  • be8827fe41
    Create LICENSE (#2) Thomas Wang 2022-10-22 10:44:52 +0200
  • c837893370 feat(router): Add max_waiting_tokens OlivierDehaene 2022-10-21 16:40:05 +0200
  • 895a341d06 fix(validation): Fix error messages OlivierDehaene 2022-10-21 10:59:15 +0200
  • f16f2f5ae1 v0.1.0 Olivier Dehaene 2022-10-18 15:19:03 +0200
  • 92c1ecd008 feat: Add arguments to CLI Olivier Dehaene 2022-10-17 18:27:33 +0200
  • 5e5d8766a2 feat: Improve error handling Olivier Dehaene 2022-10-17 14:59:00 +0200
  • 00e6ce44b1 Update aml deployment Olivier Dehaene 2022-10-17 10:39:59 +0200
  • bcb53903b8 feat: Add AML deployment Olivier Dehaene 2022-10-15 20:21:50 +0200
  • bf99afe916 feat: Docker image Olivier Dehaene 2022-10-14 15:56:21 +0200
  • 39df4d9975 Use axum Olivier Dehaene 2022-10-11 18:14:39 +0200
  • e86ecbac63 ValidationError was not correctly handled Olivier Dehaene 2022-10-11 16:53:40 +0200
  • 4c693e6524 Refactored gRPC interface Added validation logic Olivier Dehaene 2022-10-11 16:50:54 +0200
  • fa9a088467 Add load testing Olivier Dehaene 2022-10-11 10:36:51 +0200
  • 1d986983d5 fix: cleanup Olivier Dehaene 2022-10-08 12:34:25 +0200