EveryDream2trainer/doc/ATWEAKING.md

# Advanced Tweaking

## Resolution

You can train resolutions from 512 to 1024 in 64 pixel increments.  General results from the community indicate you can push the base model a bit beyond what it was designed for *with enough training*.  This will work out better when you have a lot of training data (hundreds+) and enable slightly higher resolution at inference time without seeing repeats in your generated images.  This does cost speed of training and higher VRAM use!  Ex. 768 takes a significant amount more VRAM than 512, so you will need to compensate for that by reducing ```batch_size```.

    --resolution 640 ^

For instance, if training from the base 1.5 model, you can try trying at 576, 640, or 704.

If you are training on a base model that is 768, such as SD 2.1 768-v, you should also probably use 768 as a base number and adjust from there.

## Log and ckpt save folders

If you want to use a nondefault location for saving logs or ckpt files, these:

Logdir defaults to the "logs" folder in the trainer directory.  If you wan to save all logs (including diffuser copies of ckpts, sample images, and tensbooard events) use this:

    --logdir "/workspace/mylogs"

Remember to use the same folder when you launch tensorboard (```tensorboard --logdir "/worksapce/mylogs"```) or it won't find your logs.

By default the CKPT format copies of ckpts that are peroidically saved are saved in the trainer root folder.  If you want to save them elsewhere, use this:

    --ckpt_dir "r:\webui\models\stable-diffusion"

## Conditional dropout

Conditional dropout means the prompt or caption on the training image is dropped, and the caption is "blank".  The theory is this can help with unconditional guidance, per the original paper and authors of Latent Diffusion and Stable Diffusion.

The value is defaulted at 0.04, which means 4% conditional dropout.  You can set it to 0.0 to disable it, or increase it.  Many users of EveryDream 1.0 have had great success tweaking this, especially for larger models.  You may wish to try 0.10.  This may also be useful to really "force" a style.  Setting it very high may lead to bleeding or overfitting.

    --conditional_dropout 0.1 ^

## LR tweaking

By default, the learning rate is constant for the entire training session.  However, if you want it to change by itself during training, you can use cosine.

### Cosine LR scheduler
Cosine LR scheduler will "taper off" your learning rate over time. It will reach a peak value of your ```--lr``` value then taper off following a cosine curve.

Example:

    --lr_scheduler cosine ^

There is also warmup, which will default to 2% of the decay steps.  You can manually set warmup, but it is typically more useful from training a brand new model from scratch, not for continuation training which we're all doing.  But, if you want to tweak manually anyway, use this:

    --lr_warmup_steps 100 ^

Cosine also has a decay period to define how long it takes to get to zero LR as it tapers.  By default, the trainer sets this to slightly longer than it will take to get to your ```--max_epochs``` number of steps so LR doesn't go all the way to zero and waste compute time.   However, if you want to tweak, you have to set the number of steps yourself and estimate what that will be.  If you set this, be sure to watch your LR log in tensorboard to make sure it does what you expect.

    --lr_decay_steps 2500 ^

## Gradient accumulation

Gradient accumulation is sort of like a virtual batch size increase, averaging the learning over more than one step (batch) before applying it to the model as an update to weights.

Example:

    --grad_accum 2 ^

The above example with combine the loss from 2 batches before applying updates.  This *may* be a good idea for higher resolution training that requires smaller batch size but mega batch sizes are also not the be-all-end all.

Some experimentation shows if you already have batch size in the 6-8 range than grad accumulation of more than 2 just reduces quality, but you can experiment.
docs 2022-12-17 20:59:03 -07:00			`# Advanced Tweaking`

more docs, cleanup install process 2022-12-17 23:21:35 -07:00			`## Resolution`

			You can train resolutions from 512 to 1024 in 64 pixel increments. General results from the community indicate you can push the base model a bit beyond what it was designed for with enough training. This will work out better when you have a lot of training data (hundreds+) and enable slightly higher resolution at inference time without seeing repeats in your generated images. This does cost speed of training and higher VRAM use! Ex. 768 takes a significant amount more VRAM than 512, so you will need to compensate for that by reducing ```batch_size```.

			`--resolution 640 ^`

gradient accumulation, docs 2022-12-19 00:07:14 -07:00			`For instance, if training from the base 1.5 model, you can try trying at 576, 640, or 704.`
more docs, cleanup install process 2022-12-17 23:21:35 -07:00
			`If you are training on a base model that is 768, such as SD 2.1 768-v, you should also probably use 768 as a base number and adjust from there.`

save ever n epochs, update docs 2022-12-18 19:16:14 -07:00			`## Log and ckpt save folders`

			`If you want to use a nondefault location for saving logs or ckpt files, these:`

			`Logdir defaults to the "logs" folder in the trainer directory. If you wan to save all logs (including diffuser copies of ckpts, sample images, and tensbooard events) use this:`

			`--logdir "/workspace/mylogs"`

			Remember to use the same folder when you launch tensorboard (```tensorboard --logdir "/worksapce/mylogs"```) or it won't find your logs.

			`By default the CKPT format copies of ckpts that are peroidically saved are saved in the trainer root folder. If you want to save them elsewhere, use this:`

			`--ckpt_dir "r:\webui\models\stable-diffusion"`

more docs, cleanup install process 2022-12-17 23:21:35 -07:00			`## Conditional dropout`

			`Conditional dropout means the prompt or caption on the training image is dropped, and the caption is "blank". The theory is this can help with unconditional guidance, per the original paper and authors of Latent Diffusion and Stable Diffusion.`

			`The value is defaulted at 0.04, which means 4% conditional dropout. You can set it to 0.0 to disable it, or increase it. Many users of EveryDream 1.0 have had great success tweaking this, especially for larger models. You may wish to try 0.10. This may also be useful to really "force" a style. Setting it very high may lead to bleeding or overfitting.`

			`--conditional_dropout 0.1 ^`

			`## LR tweaking`

			`By default, the learning rate is constant for the entire training session. However, if you want it to change by itself during training, you can use cosine.`

			`### Cosine LR scheduler`
			Cosine LR scheduler will "taper off" your learning rate over time. It will reach a peak value of your ```--lr``` value then taper off following a cosine curve.

			`Example:`

			`--lr_scheduler cosine ^`

			`There is also warmup, which will default to 2% of the decay steps. You can manually set warmup, but it is typically more useful from training a brand new model from scratch, not for continuation training which we're all doing. But, if you want to tweak manually anyway, use this:`

			`--lr_warmup_steps 100 ^`

gradient accumulation, docs 2022-12-19 00:07:14 -07:00			Cosine also has a decay period to define how long it takes to get to zero LR as it tapers. By default, the trainer sets this to slightly longer than it will take to get to your ```--max_epochs``` number of steps so LR doesn't go all the way to zero and waste compute time. However, if you want to tweak, you have to set the number of steps yourself and estimate what that will be. If you set this, be sure to watch your LR log in tensorboard to make sure it does what you expect.
more docs, cleanup install process 2022-12-17 23:21:35 -07:00
gradient accumulation, docs 2022-12-19 00:07:14 -07:00			`--lr_decay_steps 2500 ^`

			`## Gradient accumulation`

fix sample filenames due to illegal characters 2022-12-19 12:43:10 -07:00			`Gradient accumulation is sort of like a virtual batch size increase, averaging the learning over more than one step (batch) before applying it to the model as an update to weights.`
gradient accumulation, docs 2022-12-19 00:07:14 -07:00
			`Example:`

			`--grad_accum 2 ^`

			`The above example with combine the loss from 2 batches before applying updates. This may be a good idea for higher resolution training that requires smaller batch size but mega batch sizes are also not the be-all-end all.`

			`Some experimentation shows if you already have batch size in the 6-8 range than grad accumulation of more than 2 just reduces quality, but you can experiment.`