Update TWEAKING.md
This commit is contained in:
parent
83fbd90889
commit
161edc5f3d
|
@ -74,7 +74,7 @@ If you are training a huge dataset (20k+) then saving every 1 epoch may not be v
|
|||
|
||||
*A "last" checkpoint is always saved at the end of training.*
|
||||
|
||||
Diffusers copies of checkpoints are saved in your /logs/[project_name]/ckpts folder, and can be used to continue training if you want to pick up where you left off. CKPT files are saved in the root training folder by default. These folders can be changed. See [Advanced Tweaking](ATWEAKING.md) for more info.
|
||||
Diffusers copies of checkpoints are saved in your /logs/[project_name]/ckpts folder, and can be used to continue training if you want to pick up where you left off. CKPT files are saved in the root training folder by default. These folders can be changed. See [Advanced Tweaking](ADVANCED_TWEAKING.md) for more info.
|
||||
|
||||
### _Delay saving checkpoints_
|
||||
|
||||
|
@ -94,7 +94,7 @@ If you want to resume training from a previous run, you can do so by pointing to
|
|||
|
||||
## __Learning Rate__
|
||||
|
||||
The learning rate affects how much "training" is done on the model per training step. It is a very careful balance to select a value that will learn your data. See [Advanced Tweaking](ATWEAKING.md) for more info. Once you have started, the learning rate is a good first knob to turn as you move into more advanced tweaking.
|
||||
The learning rate affects how much "training" is done on the model per training step. It is a very careful balance to select a value that will learn your data. See [Advanced Tweaking](ADVANCED_TWEAKING.md) for more info. Once you have started, the learning rate is a good first knob to turn as you move into more advanced tweaking.
|
||||
|
||||
## __Batch Size__
|
||||
|
||||
|
@ -102,7 +102,7 @@ Batch size is also another "hyperparamter" of itself and there are tradeoffs. It
|
|||
|
||||
--batch_size 4 ^
|
||||
|
||||
While very small batch sizes can impact performance negatively, at some point larger sizes have little impact on overall speed as well, so shooting for the moon is not always advisable. Changing batch size may also impact what learning rate you use, with typically larger batch_size requiring a slightly higher learning rate. More info is provided in the [Advanced Tweaking](ATWEAKING.md) document.
|
||||
While very small batch sizes can impact performance negatively, at some point larger sizes have little impact on overall speed as well, so shooting for the moon is not always advisable. Changing batch size may also impact what learning rate you use, with typically larger batch_size requiring a slightly higher learning rate. More info is provided in the [Advanced Tweaking](ADVANCED_TWEAKING.md) document.
|
||||
|
||||
## __LR Scheduler__
|
||||
|
||||
|
@ -110,7 +110,7 @@ A learning rate scheduler can change your learning rate as training progresses.
|
|||
|
||||
At this time, ED2.0 supports constant or cosine scheduler.
|
||||
|
||||
The constant scheduler is the default and keeps your LR set to the value you set in the command line. That's really it for constant! I recommend sticking with it until you are comfortable with general training. More info in the [Advanced Tweaking](ATWEAKING.md) document.
|
||||
The constant scheduler is the default and keeps your LR set to the value you set in the command line. That's really it for constant! I recommend sticking with it until you are comfortable with general training. More info in the [Advanced Tweaking](ADVANCED_TWEAKING.md) document.
|
||||
|
||||
## __Sampling__
|
||||
|
||||
|
@ -138,4 +138,4 @@ While gradient checkpointing reduces performance, the ability to run a higher ba
|
|||
|
||||
You may NOT want to use a batch size as large as 13-14+ on your 24GB+ GPU even if possible, or you may find you need to tweak learning rate all over again to find the right balance. Generally I would not turn it on for a 24GB GPU training at <640 resolution.
|
||||
|
||||
This probably IS a good idea for training at higher resolutions and allows >768 training on 24GB GPUs. Balancing this toggle, resolution, and batch_size will take a few quick experiments to see what you can run safely.
|
||||
This probably IS a good idea for training at higher resolutions and allows >768 training on 24GB GPUs. Balancing this toggle, resolution, and batch_size will take a few quick experiments to see what you can run safely.
|
||||
|
|
Loading…
Reference in New Issue