diff --git a/README.md b/README.md index 71fade7..b35b5b8 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,8 @@ Make sure to check out the [tools repo](https://github.com/victorchall/EveryDrea [Validation](doc/VALIDATION.md) - Use a validation split on your data to see when you are overfitting and tune hyperparameters +[Troubleshooting](doc/TROUBLESHOOTING.md) + ## Cloud [Free tier Google Colab notebook](https://colab.research.google.com/github/victorchall/EveryDream2trainer/blob/main/Train_Colab.ipynb) diff --git a/doc/LOWVRAM.md b/doc/LOWVRAM.md index b87becb..356228f 100644 --- a/doc/LOWVRAM.md +++ b/doc/LOWVRAM.md @@ -20,6 +20,6 @@ Keeping the batch_size low reduces VRAM use. This is a more "fine dial" on VRAM The third is gradient accumulation, which does not reduce VRAM, but gives you a "virtual batch size multiplier" when you are not able to increase the batch_size directly. - --grad_accum 2 + --grad_accum 4 This will combine the loss from multiple batches before applying updates. There is some small VRAM overhead to this but not as much as increasing the batch size. Increasing it beyond 2 does not continue to increase VRAM, only going from 1 to 2 seems to affect VRAM use, and by a small amount.