.. |
custom_modeling
|
Merge branch 'main' into moe
|
2024-11-18 09:45:05 +08:00 |
__init__.py
|
Merge branch 'main' into moe
|
2024-11-18 09:45:05 +08:00 |
bloom.py
|
Refactor dead code - Removing all `flash_xxx.py` files. (#2166)
|
2024-07-05 10:29:56 +02:00 |
causal_lm.py
|
feat: prefill chunking (#2600)
|
2024-10-16 12:49:33 +02:00 |
flash_causal_lm.py
|
Fixing linting on main. (#2719)
|
2024-11-04 15:21:41 +01:00 |
galactica.py
|
feat: add ruff and resolve issue (#2262)
|
2024-07-26 10:29:09 -04:00 |
globals.py
|
feat: prefill chunking (#2600)
|
2024-10-16 12:49:33 +02:00 |
idefics_causal_lm.py
|
feat: prefill chunking (#2600)
|
2024-10-16 12:49:33 +02:00 |
mamba.py
|
Choosing input/total tokens automatically based on available VRAM? (#2673)
|
2024-10-28 04:59:49 +01:00 |
metadata_kernels.py
|
Hotfixing auto length (warmup max_s was wrong). (#2716)
|
2024-11-04 09:55:54 +01:00 |
mllama_causal_lm.py
|
feat: add triton kernels to decrease latency of large batches (#2687)
|
2024-10-25 21:10:00 +00:00 |
model.py
|
Choosing input/total tokens automatically based on available VRAM? (#2673)
|
2024-10-28 04:59:49 +01:00 |
pali_gemma.py
|
feat: add ruff and resolve issue (#2262)
|
2024-07-26 10:29:09 -04:00 |
seq2seq_lm.py
|
feat: prefill chunking (#2600)
|
2024-10-16 12:49:33 +02:00 |
types.py
|
feat: prefill chunking (#2600)
|
2024-10-16 12:49:33 +02:00 |
vlm_causal_lm.py
|
fix cuda graphs for qwen2-vl (#2708)
|
2024-11-01 03:05:34 +01:00 |