From ac8bb6faab0ef5edeb3ed713001dbc2cf8f573a8 Mon Sep 17 00:00:00 2001
From: Victor Hall <victor.charles.hall@gmail.com>
Date: Mon, 23 Jan 2023 16:28:23 -0500
Subject: [PATCH] more docs, more docs, more docs, ok stop docs

---
 README.md      | 12 +++++++-----
 doc/LOWVRAM.md | 27 +++++++++------------------
 2 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/README.md b/README.md
index 35ee21f..aec1cb0 100644
--- a/README.md
+++ b/README.md
@@ -29,14 +29,16 @@ Make sure to check out the [tools repo](https://github.com/victorchall/EveryDrea
 
 [Data Preparation](doc/DATA.md)
 
-[Training](doc/TRAINING.md)
+[Training](doc/TRAINING.md) - How to start training
 
-[Basic Tweaking](doc/TWEAKING.md)
+[Basic Tweaking](doc/TWEAKING.md) - Important args to understand to get started
 
-[Logging](doc/LOGGING.md)
+[Logging](doc/LOGGING.md) 
 
-[Advanced Tweaking](doc/ATWEAKING.md)
+[Advanced Tweaking](doc/ATWEAKING.md) - More stuff to tweak once you are comfortable
 
-[Chaining training sessions](doc/CHAINING.md)
+[Chaining training sessions](doc/CHAINING.md) - Modify training parameters by chaining training sessions together end to end
 
 [Shuffling Tags](doc/SHUFFLING_TAGS.md)
+
+[Data Balancing](doc/BALANCING.md) - Includes my small treatise on model preservation with ground truth data
\ No newline at end of file
diff --git a/doc/LOWVRAM.md b/doc/LOWVRAM.md
index a92e008..b87becb 100644
--- a/doc/LOWVRAM.md
+++ b/doc/LOWVRAM.md
@@ -1,34 +1,25 @@
 # EveryDream 2 low VRAM users guide (<16GB)
 
-Short version, for 12GB cards, use these arguments:
-
-    --lowvram
-
-This will override various arguments for you to enable training on 12GB cards. 
-
 A few key arguments will enable training with lower amounts of VRAM.
 
-The first is the most impactful.
+Amp mode stands for "automatic mixed precision" which allows Torch to choose to execute certain operations that are numerically "safe" to be run in FP16 precision in FP16 precision (addition, subtraction), but use FP32 if unsafe (POW, etc).  This saves some VRAM but also grants a significant performance boost.
+
+    --amp
+
+The next has a significant impact on VRAM.
 
     --gradient_checkpointing
 
-This enables gradient checkpoint This will reduce the VRAM usage MANY gigabytes.  By itself, and with batch_size 1 VRAM use can be as low as 11.9GB (out of an actual 12.2GB).  This is very tight on a 12GB card such as a 3060 12GB so you will need to take care on what other applications are open. 
+This enables gradient checkpoint This will reduce the VRAM usage MANY gigabytes. There is a small performance loss, but you can also possible increase your batch size by using it.
 
 The second is batch_size in general.
 
     --batch_size 1
 
-Keeping the batch_size low reduces VRAM use.  This is a more "fine dial" on VRAM use. Adjusting it up or down by 1 will increase or decrease VRAM use by about 1GB.  For 12GB gpus you will need to keep batch_size 1.
+Keeping the batch_size low reduces VRAM use.  This is a more "fine dial" on VRAM use. Adjusting it up or down by 1 will increase or decrease VRAM use by about 0.5-1GB.  For 12GB gpus you will need to keep batch_size 1 or 2.
 
-The third is gradient accumulation.
+The third is gradient accumulation, which does not reduce VRAM, but gives you a "virtual batch size multiplier" when you are not able to increase the batch_size directly.
 
     --grad_accum 2
 
-This will combine the loss from multiple batches before applying updates.  This is like a "virtual batch size multiplier" so if you are limited to just a batch size of 1 or 2 you can increase this to gain some benefits of generalization across multiple images, similar to increasing the batch size.  There is some small VRAM overhead to this, but only when incrementing it from 1 to 2.  If you can run grad_accum 2, you can run 4 or 6.  Your goal here should be to get batch_size times grad_accum to around 8-10.  If you want to try really high values of grad_accum you can, but so far it seems massive batch sizes are not as helpful as you might think.
-
-These are the forced parameters for --lowvram:
-
-    --gradient_checkpointing
-    --batch_size 1
-    --grad_accum 1
-    --resolution 512
+This will combine the loss from multiple batches before applying updates.  There is some small VRAM overhead to this but not as much as increasing the batch size.  Increasing it beyond 2 does not continue to increase VRAM, only going from 1 to 2 seems to affect VRAM use, and by a small amount.