EveryDream2trainer/doc/LOWVRAM.md

# EveryDream 2 low VRAM users guide (<16GB)

A few key arguments will enable training with lower amounts of VRAM.

Amp mode stands for "automatic mixed precision" which allows Torch to choose to execute certain operations that are numerically "safe" to be run in FP16 precision in FP16 precision (addition, subtraction), but use FP32 if unsafe (POW, etc).  This saves some VRAM but also grants a significant performance boost.

    --amp

The next has a significant impact on VRAM.

    --gradient_checkpointing

This enables gradient checkpoint This will reduce the VRAM usage MANY gigabytes. There is a small performance loss, but you can also possible increase your batch size by using it.

The second is batch_size in general.

    --batch_size 1

Keeping the batch_size low reduces VRAM use.  This is a more "fine dial" on VRAM use. Adjusting it up or down by 1 will increase or decrease VRAM use by about 0.5-1GB.  For 12GB gpus you will need to keep batch_size 1 or 2.

The third is gradient accumulation, which does not reduce VRAM, but gives you a "virtual batch size multiplier" when you are not able to increase the batch_size directly.

    --grad_accum 2

This will combine the loss from multiple batches before applying updates.  There is some small VRAM overhead to this but not as much as increasing the batch size.  Increasing it beyond 2 does not continue to increase VRAM, only going from 1 to 2 seems to affect VRAM use, and by a small amount.
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00			`# EveryDream 2 low VRAM users guide (<16GB)`

more docs, more docs, more docs, ok stop docs 2023-01-23 14:28:23 -07:00			`A few key arguments will enable training with lower amounts of VRAM.`
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00
more docs, more docs, more docs, ok stop docs 2023-01-23 14:28:23 -07:00			`Amp mode stands for "automatic mixed precision" which allows Torch to choose to execute certain operations that are numerically "safe" to be run in FP16 precision in FP16 precision (addition, subtraction), but use FP32 if unsafe (POW, etc). This saves some VRAM but also grants a significant performance boost.`
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00
more docs, more docs, more docs, ok stop docs 2023-01-23 14:28:23 -07:00			`--amp`
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00
more docs, more docs, more docs, ok stop docs 2023-01-23 14:28:23 -07:00			`The next has a significant impact on VRAM.`
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00
fix lowvram doc 2023-01-06 16:51:20 -07:00			`--gradient_checkpointing`
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00
more docs, more docs, more docs, ok stop docs 2023-01-23 14:28:23 -07:00			`This enables gradient checkpoint This will reduce the VRAM usage MANY gigabytes. There is a small performance loss, but you can also possible increase your batch size by using it.`
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00
			`The second is batch_size in general.`

			`--batch_size 1`

more docs, more docs, more docs, ok stop docs 2023-01-23 14:28:23 -07:00			`Keeping the batch_size low reduces VRAM use. This is a more "fine dial" on VRAM use. Adjusting it up or down by 1 will increase or decrease VRAM use by about 0.5-1GB. For 12GB gpus you will need to keep batch_size 1 or 2.`
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00
more docs, more docs, more docs, ok stop docs 2023-01-23 14:28:23 -07:00			`The third is gradient accumulation, which does not reduce VRAM, but gives you a "virtual batch size multiplier" when you are not able to increase the batch_size directly.`
ed1 emulation mode stuff 2023-01-06 14:36:51 -07:00
			`--grad_accum 2`

more docs, more docs, more docs, ok stop docs 2023-01-23 14:28:23 -07:00			`This will combine the loss from multiple batches before applying updates. There is some small VRAM overhead to this but not as much as increasing the batch size. Increasing it beyond 2 does not continue to increase VRAM, only going from 1 to 2 seems to affect VRAM use, and by a small amount.`