more docs, more docs, more docs, ok stop docs

This commit is contained in:
Victor Hall 2023-01-23 16:28:23 -05:00
parent 8c9e7f1db8
commit ac8bb6faab
2 changed files with 16 additions and 23 deletions

View File

@ -29,14 +29,16 @@ Make sure to check out the [tools repo](https://github.com/victorchall/EveryDrea
[Data Preparation](doc/DATA.md)
[Training](doc/TRAINING.md)
[Training](doc/TRAINING.md) - How to start training
[Basic Tweaking](doc/TWEAKING.md)
[Basic Tweaking](doc/TWEAKING.md) - Important args to understand to get started
[Logging](doc/LOGGING.md)
[Logging](doc/LOGGING.md)
[Advanced Tweaking](doc/ATWEAKING.md)
[Advanced Tweaking](doc/ATWEAKING.md) - More stuff to tweak once you are comfortable
[Chaining training sessions](doc/CHAINING.md)
[Chaining training sessions](doc/CHAINING.md) - Modify training parameters by chaining training sessions together end to end
[Shuffling Tags](doc/SHUFFLING_TAGS.md)
[Data Balancing](doc/BALANCING.md) - Includes my small treatise on model preservation with ground truth data

View File

@ -1,34 +1,25 @@
# EveryDream 2 low VRAM users guide (<16GB)
Short version, for 12GB cards, use these arguments:
--lowvram
This will override various arguments for you to enable training on 12GB cards.
A few key arguments will enable training with lower amounts of VRAM.
The first is the most impactful.
Amp mode stands for "automatic mixed precision" which allows Torch to choose to execute certain operations that are numerically "safe" to be run in FP16 precision in FP16 precision (addition, subtraction), but use FP32 if unsafe (POW, etc). This saves some VRAM but also grants a significant performance boost.
--amp
The next has a significant impact on VRAM.
--gradient_checkpointing
This enables gradient checkpoint This will reduce the VRAM usage MANY gigabytes. By itself, and with batch_size 1 VRAM use can be as low as 11.9GB (out of an actual 12.2GB). This is very tight on a 12GB card such as a 3060 12GB so you will need to take care on what other applications are open.
This enables gradient checkpoint This will reduce the VRAM usage MANY gigabytes. There is a small performance loss, but you can also possible increase your batch size by using it.
The second is batch_size in general.
--batch_size 1
Keeping the batch_size low reduces VRAM use. This is a more "fine dial" on VRAM use. Adjusting it up or down by 1 will increase or decrease VRAM use by about 1GB. For 12GB gpus you will need to keep batch_size 1.
Keeping the batch_size low reduces VRAM use. This is a more "fine dial" on VRAM use. Adjusting it up or down by 1 will increase or decrease VRAM use by about 0.5-1GB. For 12GB gpus you will need to keep batch_size 1 or 2.
The third is gradient accumulation.
The third is gradient accumulation, which does not reduce VRAM, but gives you a "virtual batch size multiplier" when you are not able to increase the batch_size directly.
--grad_accum 2
This will combine the loss from multiple batches before applying updates. This is like a "virtual batch size multiplier" so if you are limited to just a batch size of 1 or 2 you can increase this to gain some benefits of generalization across multiple images, similar to increasing the batch size. There is some small VRAM overhead to this, but only when incrementing it from 1 to 2. If you can run grad_accum 2, you can run 4 or 6. Your goal here should be to get batch_size times grad_accum to around 8-10. If you want to try really high values of grad_accum you can, but so far it seems massive batch sizes are not as helpful as you might think.
These are the forced parameters for --lowvram:
--gradient_checkpointing
--batch_size 1
--grad_accum 1
--resolution 512
This will combine the loss from multiple batches before applying updates. There is some small VRAM overhead to this but not as much as increasing the batch size. Increasing it beyond 2 does not continue to increase VRAM, only going from 1 to 2 seems to affect VRAM use, and by a small amount.