doc update

This commit is contained in:
Victor Hall 2023-03-15 22:19:17 -04:00
parent d26520e67b
commit 92815a869d
2 changed files with 16 additions and 5 deletions

View File

@ -145,6 +145,8 @@ Based on [Nicholas Guttenberg's blog post](https://www.crosslabs.org//blog/diffu
Test results: https://huggingface.co/panopstor/ff7r-stable-diffusion/blob/main/zero_freq_test_biggs.webp
Very tentatively, I suggest closer to 0.10 for short term training, and lower values of around 0.02 to 0.03 for longer runs (50k+ steps). Early indications seem to suggest values like 0.10 can cause divergance over time.
# Stuff you probably don't need to mess with, but well here it is:

View File

@ -6,11 +6,21 @@ VRAM use depends on the model being trained (SD1.5 vs SD2.1 base), batch size, r
## Stuff you want on for 12GB cards
AMP and AdamW8bit are now defaulted to on. These are VRAM efficient and should be on for all training.
AMP and AdamW8bit are now defaulted to on. These are VRAM efficient, produce high quality results, and should be on for all training.
If you are using a customized optimizer.json, make sure `adamw8bit` is set as the optimizer. `AdamW` is significantly more VRAM intensive. "lion" is another option that is VRAM efficient, but is still fairly experimental in terms of understanding the best LR, betas, and weight decay settings. See [Optimizer docs](OPTIMIZER.md) for more information on advanced optimizer config if you want to try `lion` optimizer. `adamw8bit` is the recommended and also the default.
Gradient checkpointing can still be turned on and off, and is not on by default. Turning it on will greatly reduce VRAM use at the expense of some performance. It is suggested to turn it on for any GPU with less than 16GB VRAM and *is definitely required for 12GB cards*.
Gradient checkpointing can still be turned on and off, and is not on by default. Turning it on will greatly reduce VRAM use at the expense of some performance. It is suggested to turn it on for any GPU with less than 16GB VRAM.
If you are using a customized `optimizer.json`, make sure `adamw8bit` is set as the optimizer. `AdamW` is significantly more VRAM intensive. `lion` is another option that is VRAM efficient, but is still fairly experimental in terms of understanding the best LR, betas, and weight decay settings. See [Optimizer docs](OPTIMIZER.md) for more information on advanced optimizer config if you want to try `lion` optimizer. *`adamw8bit` is the recommended and also the default.*
SD2.1 with the larger text encoder model may not train on 12GB cards. SD1.5 should work fine.
Batch size of 1 or 2 may be all you can use on 12GB.
Resolution of 512 may be all you can use on 12GB. You could try 576 or 640 at batch size 1.
Due to other things running on any given users' systems, precise advice cannot be given on what will run, though 12GB certainly can and does work.
Close all other programs and processes that are using GPU resources. Apps like Chrome and Discord can use many hundreds of megabytes of VRAM, and can add up quickly. You can also try disabling "hardware acceleration" in some apps which will shift the resources to CPU and system RAM, and save VRAM.
## I really want to train higher resolution, what do I do?
@ -18,6 +28,5 @@ Gradient checkpointing is pretty useful even on "high" VRAM GPUs like a 24GB 309
`--gradient_checkpointing` in CLI or in json `"gradient_checkpointing": true`
It is not suggested on 24GB GPUs at 704 or lower resolutoon. I would keep it off and reduce batch size instead.
It is not suggested on 24GB GPUs at 704 or lower resolutoon. I would keep it off and reduce batch size instead to fit your training into VRAM.
Gradient checkpointing is also critical for lower VRAM GPUs like 16 GB T4 (Colab free tier) or 3060 12GB, 2080 Ti 11gb, etc. You most likely should keep it on for any GPU with less than 24GB and adjust batch size up or down to fit your VRAM.