This website requires JavaScript.
972e9a7f7c
update causal batch for ct2 and fix nf4 (#17 )
main
Michael Feil
2024-02-09 11:07:14 -0800
7f55c3ceaa
bump the ctranslate2 version
#17
Michael Feil
2023-12-01 17:28:51 +0100
c5a294b76b
update causal batch for ct2 and fix nf4
michaelfeil
2023-11-04 00:00:00 +0000
339ede9e90
Update Readme.md / documentation (#15 )
Michael Feil
2023-10-04 08:01:06 +0200
bbd02184ce
Update README.md
#15
Michael Feil
2023-10-03 15:16:27 +0200
229a1bc985
update readme
michaelfeil
2023-10-03 10:53:33 +0200
393647af37
add documentation updates
michaelfeil
2023-10-03 10:48:40 +0200
ff703cb867
Adding ctranslate2 quantization and inference: moving the contribution (#1 )
Michael Feil
2023-10-02 20:12:49 +0200
09e88f2470
Merge branch 'main' into ct2_support
#1
michaelfeil
2023-10-02 00:00:00 +0000
012c917b6f
Wrapping completions and chat/completions endpoint (#2 )
Michael Feil
2023-09-27 17:58:07 +0200
f93012d59c
Merge pull request #4 from michaelfeil/bnb_4bit
Yang, Bo
2023-09-08 14:52:32 -0700
072f267cc3
Initialize v_cache to avoid NaNs (#12 )
Yang, Bo
2023-08-23 14:23:59 -0700
57deda586e
Update flash_causal_lm.py
#12
Yang, Bo
2023-08-23 14:22:08 -0700
a5f96fd18e
Update flash_causal_lm.py
Yang, Bo
2023-08-23 14:21:52 -0700
c360d45c9f
Initialize v_cache to avoid NaNs
Yang, Bo
2023-08-23 14:15:35 -0700
2fda8fe812
Initialize v_cache to avoid NaNs (#11 )
Yang, Bo
2023-08-23 14:07:06 -0700
c6114f4b0d
Initialize v_cache to avoid NaNs
#11
Yang, Bo
2023-08-23 21:00:58 +0000
1e646fb41d
Compilation fix: Correct method argument types in generation.rs and validation.rs (#10 )
Jason Sun
2023-08-23 16:52:49 -0400
45dc82b8b4
Update router/src/validation.rs
#10
Jason Sun
2023-08-22 09:32:12 -0700
e5a5db61ea
Update benchmark/src/generation.rs
Jason Sun
2023-08-22 09:32:06 -0700
ac2fe4f8c6
fix: Correct method argument types in generation and validation
Jason Sun
2023-08-21 15:14:55 -0700
8130300c9a
fix: 2038y problem
#2
michaelfeil
2023-08-07 13:02:34 +0200
ab58232a3d
cargo fmt
michaelfeil
2023-08-07 12:28:59 +0200
e8ca636eea
rebase and squash commits on latest main
michaelfeil
2023-08-07 12:25:17 +0200
8ddfbaafb9
Merge branch 'ct2_support' of https://github.com/michaelfeil/preemo-text-generation-inference into ct2_support
michaelfeil
2023-08-07 11:06:54 +0200
b9326ace1a
adapt path
michaelfeil
2023-08-06 18:26:30 +0200
a732244687
update changes for dockerfile
michaelfeil
2023-08-06 17:10:44 +0200
c089b19487
update dockerfile
michaelfeil
2023-08-04 14:33:49 +0200
df1e7b513a
reformatting and changes.
michaelfeil
2023-08-04 14:18:56 +0200
ee81780ba4
rebaseing the commit on preemo fork.
michaelfeil
2023-07-30 14:07:28 +0200
5963554641
adapt path
michaelfeil
2023-08-06 18:26:30 +0200
24632c5105
update changes for dockerfile
michaelfeil
2023-08-06 17:10:44 +0200
2ac9db513a
update dockerfile
michaelfeil
2023-08-04 14:33:49 +0200
bc4b3f97ec
reformatting and changes.
michaelfeil
2023-08-04 14:18:56 +0200
da9746586b
Update README.md
#4
Michael Feil
2023-08-03 23:23:02 +0200
a9838bba2f
Modify exllama weight
Michael Feil
2023-08-03 23:20:59 +0200
d2ae3581bf
Claim copyright (#7 )
Yang, Bo
2023-08-02 17:23:54 -0700
13f559c305
Claim copyright
#7
Yang, Bo
2023-08-02 16:15:53 -0700
8af4a7a0b0
Merge branch 'main' into bnb_4bit
Yang, Bo
2023-08-02 12:47:17 -0700
b5fadc4c28
Don't enable custom kernels if CUDA is not available (#6 )
Yang, Bo
2023-08-02 09:51:54 -0700
8a5f80bb61
Add AutoCausalLM (#5 )
Yang, Bo
2023-08-02 09:35:40 -0700
656f2fe4dc
fix: typo
michaelfeil
2023-08-02 16:56:14 +0200
ec8590a3f1
Don't enable custom kernels if CUDA is not available
#6
Yang, Bo
2023-08-01 17:58:00 -0700
ef006ccee2
Merge branch 'AutoCausalLM' of https://github.com/Atry/hf-text-generation-inference into HEAD
#5
Yang, Bo
2023-08-01 12:30:08 -0700
9048a80f8f
Add a new README (#3 )
Yang, Bo
2023-08-01 12:22:07 -0700
4c2237b2a0
update PR template
michaelfeil
2023-08-01 18:18:28 +0200
44fa36b5bf
restoring commit from dev branch, rebase on current master
michaelfeil
2023-08-01 18:15:18 +0200
220b2afc8a
Update README.md
#3
Yang, Bo
2023-07-31 21:39:29 -0700
76206a513f
Add Preemo's README
Yang, Bo
2023-07-31 21:35:16 -0700
8c3d8a10cd
Rename README.md to README-HuggingFace.md
Yang, Bo
2023-07-31 21:34:47 -0700
08b50a5bb9
rebaseing the commit on preemo fork.
michaelfeil
2023-07-30 14:07:28 +0200
afd04dc71e
feat(server): update vllm version (#723 )
OlivierDehaene
2023-07-28 15:36:38 +0200
f848decee6
docs: Add hardware section to TOC in README (#721 )
regisss
2023-07-28 11:20:03 +0200
5a1cccbb98
Add section about TGI on other AI hardware accelerators in README (#715 )
regisss
2023-07-28 09:14:03 +0200
9f18f4c006
v0.9.4 (#713 )
OlivierDehaene
2023-07-27 19:25:15 +0200
ab96b9aec3
feat(server): support new falcon config (#712 )
OlivierDehaene
2023-07-27 18:38:57 +0200
2efd46ef95
fix(server): fix missing datasets in quantize
OlivierDehaene
2023-07-27 14:50:45 +0200
8bd0adb135
fix(server): fix quantization python requirements (#708 )
OlivierDehaene
2023-07-27 12:28:10 +0200
e64a65891b
docs(README): update readme
OlivierDehaene
2023-07-25 19:45:25 +0200
a0d55358d2
feat(server): Using `quantize_config.json` instead of GPTQ_BITS env variables. (#671 )
Nicolas Patry
2023-07-25 12:00:27 +0100
9bb64c92a9
Add AutoCausalLM
Yang, Bo
2023-07-12 01:07:10 +0000
37df6df38e
fix(server): fix exllama buffers (#689 )
OlivierDehaene
2023-07-24 14:25:43 +0200
73a4d65d26
feat: add cuda memory fraction (#659 )
OlivierDehaene
2023-07-24 11:43:58 +0200
1da642bd0e
feat(server): add local prom and health routes if running w/ ngrok
OlivierDehaene
2023-07-21 16:56:30 +0200
15b3e9ffb0
Directly load GPTBigCode to specified device (#618 )
Yang, Bo
2023-07-21 02:27:31 -0700
d5b5bc750f
feat(server): Add exllama GPTQ CUDA kernel support #553 (#666 )
Nicolas Patry
2023-07-21 10:59:00 +0200
bf94df3c71
fix(server): use mem_get_info to get kv cache size (#664 )
OlivierDehaene
2023-07-20 17:23:49 +0200
08b8eec1d7
fix(server): Fixing non parameters in quantize script `bigcode/starcoder` was an example. (#661 )
Nicolas Patry
2023-07-20 16:04:15 +0200
362883f259
fix(server): llama v2 GPTQ (#648 )
fxmarty
2023-07-20 15:02:54 +0200
214c06f510
Add trust_remote_code to quantize script (#647 )
cdawg
2023-07-20 13:53:08 +0200
5a1512c025
docs: Update README.md (#643 )
Nicolas Patry
2023-07-19 13:39:12 +0200
1c81df15cd
docs: Update README.md (#639 )
Nicolas Patry
2023-07-19 13:38:52 +0200
b66b190403
feat(router): ngrok edge (#642 )
OlivierDehaene
2023-07-19 11:59:58 +0200
fe80f5360c
feat(server): auto max_batch_total_tokens for flash att models (#630 )
OlivierDehaene
2023-07-19 09:31:25 +0200
5e6ddfd6a4
fix(server): fix llamav2 config (#635 )
OlivierDehaene
2023-07-18 18:49:42 +0200
cf83f9b66f
v0.9.3 (#634 )
OlivierDehaene
2023-07-18 18:11:20 +0200
211b211ec0
feat(server): add support for llamav2 (#633 )
Nicolas Patry
2023-07-18 18:09:53 +0200
3b71c38558
feat(server): flash attention v2 (#624 )
OlivierDehaene
2023-07-18 16:21:18 +0200
4d38a1c4ad
feat(server): Reworking the quantization script so it's still universal (not llama specific) (#587 )
Nicolas Patry
2023-07-18 12:19:05 +0200
44acf72a73
fea(launcher): debug logs (#623 )
OlivierDehaene
2023-07-17 19:03:07 +0200
bc2873246c
fix(launcher): Rename `b-float16` to `bfloat16` in the launcher arg (#621 )
Nicolas Patry
2023-07-17 18:38:16 +0200
a2cf1bdb2f
fix(server): empty_cache when stopped
OlivierDehaene
2023-07-15 13:57:31 +0200
c58a0c185b
v0.9.2 (#616 )
OlivierDehaene
2023-07-14 16:31:48 +0200
5b9de4a1d3
fix(server): blacklist local files (#609 )
OlivierDehaene
2023-07-13 21:54:55 +0200
c8b077be79
docs: README: Add logo + baseline (#611 )
Victor Muštar
2023-07-13 21:45:20 +0200
982ce3227b
feat(router): explicit warning if revision is not set (#608 )
OlivierDehaene
2023-07-13 18:59:38 +0200
b7327205a6
feat(launcher): add arg validation and drop subprocess (#595 )
OlivierDehaene
2023-07-13 14:22:37 +0200
3628559516
GPTQ Env vars: catch correct type of error (#596 )
ssmi153
2023-07-13 01:57:46 +0800
f2f0289fb9
feat(server): empty cache on errors
OlivierDehaene
2023-07-12 17:05:50 +0200
67347950b7
feat(server): Implements sharding for non divisible `vocab_size`. (#583 )
Nicolas Patry
2023-07-12 16:43:31 +0200
2c4bf88268
fix(server): Bug fixes for GPTQ_BITS environment variable passthrough (#590 )
ssmi153
2023-07-12 20:17:35 +0800
7f9072228a
fix(server): Adding logger import to t5_modeling.py (#585 )
Adam Kowalski
2023-07-12 03:40:32 -0500
db4efbf4bc
fix(server): T5 weights names. (#582 )
Nicolas Patry
2023-07-12 10:01:42 +0200
f063ebde10
chore: migrate ci region for more availability. (#581 )
Nicolas Patry
2023-07-12 10:01:01 +0200
5bd2ab6583
feat(server): Support for env value for GPTQ_BITS and GPTQ_GROUPSIZE. (#580 )
Nicolas Patry
2023-07-12 10:00:02 +0200
f0181436f4
fix(server): Fixing RW code (it's remote code so the Arch checking doesn't work to see which weights to keep). (#579 )
Nicolas Patry
2023-07-12 09:51:34 +0200
b4024edd45
feat: better errors for warmup and TP (#575 )
OlivierDehaene
2023-07-10 14:47:15 +0200
e943a294bc
fix(server): harden the weights choice to save on disk. (#561 )
Nicolas Patry
2023-07-07 14:50:12 +0200
31b36cca21
v0.9.1 (#558 )
OlivierDehaene
2023-07-06 16:05:42 +0200
c4bb5264ac
fix(server): decrease memory fragmentation (#557 )
OlivierDehaene
2023-07-06 14:28:33 +0200