some final cleanup and readme updates

2022-11-06 19:59:37 -05:00 · 2022-11-06 19:59:37 -05:00 · 853a7500f4
parent aacbde8bc7
commit 853a7500f4
12 changed files with 175 additions and 284 deletions
--- a/README.md
+++ b/README.md
@ -1,31 +1,50 @@
 # Every Dream trainer for Stable Diffusion

-This is a bit of a divergence from other fine tuning methods out there for Stable Diffusion.  No more "DreamBooth" stuff like tokens, classes, or regularization, though I thank the DreamBooth training community for sharing information and techniques.  Yet, it is time to move on to explore more capability in fine tuning.
+This is a bit of a divergence from other fine tuning methods out there for Stable Diffusion.  This is a general purpose fine-tuning codebase meant to bridge the gap from small scales (ex Texual Inversion, Dreambooth) and large scale (i.e. full fine tuning on large clusters of GPUs).  It is designed to run on a local 24GB Nvidia GPU, currently the 3090, 3090 Ti, 4090, or other various Quadrios and datacenter cards (A5500, A100, etc). 

 Please join us on Discord! https://discord.gg/uheqxU6sXN

-If you find this tool useful, please consider donating to the project on [Patreon](https://www.patreon.com/everydream).  It is a lot of work to maintain and develop.  Thank you!
+If you find this tool useful, please consider subscribing to the project on [Patreon](https://www.patreon.com/everydream) or buy me a [Ko-fi](https://ko-fi.com/everydream). The tools are open source and free, but it is a lot of work to maintain and develop and donations will allow me to expand capabilties and spend more time on the project.
+
+## Main features
+
+* **Supervised Learning** - Caption support reads the filename for each image as opposed to just token/class of dream booth implementations.  This also means you can train multiple subjects, multiple artstyles, or whatever multiple-anything-you-want in one training session into one model, including the context around your characters, like their clothing, background, cityscapes, or the common artstyle shared across them. 
+* **Multiple Aspect Ratios** - Supports everything from 1:1 (square) to 4:1 (super tall) or 1:4 (super wide) all at the same time with no fuss.
+* **Auto-Scaling** - Automatically scales the image to the aspect ratios of the model.  No need to crop or resize images.  Just throw them in and let the code do the work.
+* **6 image batches** - Supports 6 images per batch on a 24GB GPU.  Support for lower VRAM GPUs pending...
+* **Full unfrozen model** - The model is fully unfrozen for better training.
+* **Recursive load** - Loads all images in a directory and subdirectories so you can organize your data set however you like. 

 ## Onward to Every Dream
-This trainer is focused on enabling fine tuning with new training data plus weaving in original, ground truth images scraped from the web via Laion dataset or other publically available ML image sets.  Compared to DreamBooth, concepts such as regularization have been removed in favor of adding back ground truth data (ex. Laion), and token/class concepts are removed and replaced by per-image captioning for training, more or less equal to how Stable Diffusion was trained itself. This is a shift back to the original training code and methodology for fine tuning for general cases.
+This trainer is focused on enabling fine tuning with new training data plus weaving in original, ground truth images scraped from the web via Laion dataset or other publically available ML image sets.  Compared to DreamBooth, concepts such as regularization have been removed in favor of support for adding back ground truth data (ex. Laion), and token/class concepts are removed and replaced by per-image captioning for training, more or less equal to how Stable Diffusion was trained itself. This is a shift back to the original training code and methodology for fine tuning for general cases.

 To get the most out of this trainer, you will need to curate a data set to be trained in addition to collect ground truth images to help preserve the model integrity and character.  Luckily, there are additional tools below to help enable that, and will grow over time.

+Check out the tools repo here: [Every Dream Tools](https://www.github.com/victorchall/everydream) for automated captioning and Laion web scraper tools.
+
+## Installation
+
+You will need Anaconda or Miniconda.
+
+1. Clone the repo:  `git clone https://www.github.com/victorchall/everydream-trainer.git`
+2. Create a new conda environment with the provided environment.yml file: `conda env create -f environment.yml`
+3. Activate the environment: `conda activate everydream`
+
+Please note other repos are using older versions of some packages like torch, torchvision, and transformers that are known to be less VRAM efficient and cause problems.  Please make a new conda environment for this repo and use the provided environment.yml file.
+
 ## Techniques

 This is a general purpose fine tuning app.  You can train large or small scale with it and everything in between.

-Check out [MICROMODELS.MD](./doc/MICROMODELS.MD) for a quickstart guide and example for quick model creation with a small data set.  It is suited for training one or two subects with 20-50 images with no preservation in 10-25 minutes.
+Check out [MICROMODELS.MD](./doc/MICROMODELS.MD) for a quickstart guide and example for quick model creation with a small data set.  It is suited for training one or two subects with 20-50 images each with no preservation in 10-30 minutes depending on your content.

 Or [README-FF7R.MD](./doc/README-FF7R.MD) for large scale training of many characters with model preservation.

-**The trainer now is insensitive to size and aspect ratio of training images with the Multi-Aspect feature!**  Collect your images, use the [Tools](#ground-truth-data-sources-and-data-engineering) to automatically caption your images and go! 
-
-More info coming soon on even larger training.
+You can scale up or down from there.  The code is designed to be flexible by adjusting the yaml (#)

 ## Image Captioning

-This trainer is built to use the filenames of your images as "captions" on a per-image basis, *so the entire Latent Diffusion model can be trained effectively.*  **Image captioning is a big step forward.** 
+This trainer is built to use the filenames of your images as "captions" on a per-image basis, *so the entire Latent Diffusion model can be trained effectively.*  **Image captioning is a big step forward.** I strongly suggest you use the tools repo to caption your images, or write meaningful filenames for your images.  This is a big step forward in training the model and will help it learn more effectively.  

 ### Formatting

@ -36,11 +55,11 @@ The filenames are using for captioning, with a split on underscore so you can ha
    john jacob jingleheimerschmidt sitting on a bench in a park with trees in the background_(1).png
    john jacob jingleheimerschmidt sitting on a bench in a park with trees in the background_(2).png

-In the 3rd and 4th example above, the _(1) and _(2) are ignored and not considered by the trainer.  This is useful if you end up with duplicate filenames but different image contents for whatever reason. 
+In the 3rd and 4th example above, the _(1) and _(2) are ignored and not considered by the trainer.  This is useful if you end up with duplicate filenames but different image contents for whatever reason, but that is generally a rare case.  

 ### Data set organization

-You can place all your images in some sort of root training folder and the traniner will recurvisely local and find them all.
+You can place all your images in some sort of "root" training folder and the traniner will recurvisely locate and find them all from any number of subfolders and add them to the queue for training.

 You may wish to organize with subfolders so you can adjust your training data mix, something like this:

@ -51,62 +70,59 @@ You may wish to organize with subfolders so you can adjust your training data mi
    /training_samples/MyProject/paintings_laion
    /training_samples/MyProject/drawings_laion

-In the above example, "/training_samples/MyProject" will be your root folder for the command line.  
+In the above example, "training_samples/MyProject" will be the "--data_root" folder for the command line.  

-As you build your data set, you may find it is easiest to organize in this way to track your balance between new training data and ground truth used to preserve the model integrity.  For instance, if you have 500 new training images in ../man you may with to use 500  in the /man_laion and another 500 in /man_nvflickr.  You can then experiment by removing different folders to see the effects on training quality and model preservation.  Adding more original ground truth data add  training time, but keep your model from "veering off course" and losing its character. 
-
-### Suggestions
-
-The more data you add from ground truth data sets such as Laion, the more training you will get away with without "damaging" the original model.  The wider variety of data in the ground truth portion of your dataset, the less likely your training images are to "bleed" into the rest of your model, losing qualities like the ability to generate images of other styles you are not training.  This is about knowledge retention in the model by refeeding it the same data it was originally trained on.
+As you build your data set, you may find it is easiest to organize in this way to track your balance between new training data and ground truth used to preserve the model integrity.  For instance, if you have 500 new training images in "training_samples/MyProject/man" you may with to use 300  in the "man_laion" and another 200 in "/"man_nvflickr".  You can then experiment by removing different folders to see the effects on training quality and model preservation. 

+You can also organize subfolders for each character if you wish to train many characters so you can add and remove them, and easily track that you are balancing the number of images for each.
 ## Ground truth data sources and data engineering

-Visit [EveryDream Data Engineering Tools](https://github.com/victorchall/EveryDream) to find a **web scraper** that can pull down images from the Laion dataset along with an **Auto Caption** script to prepare your data.  You should consider that your first step before using this trainer.  If you already have data, you can use that, too, but I encourage you to caption your data with that tool for improved training results. 
+Visit [EveryDream Data Engineering Tools](https://github.com/victorchall/EveryDream) to find a **web scraper** that can pull down images from the Laion dataset along with an **Auto Caption** script to prepare your data.  You should consider that your first step before using this trainer if you wish to train a significant number of characters and if you wish to keep them or the general shared style of your subjects or art styles from bleeding into the rest of the model. 
+
+The more data you add from ground truth data sets such as Laion, the more training you will get away with without "damaging" the original model.  The wider variety of data in the ground truth portion of your dataset, the less likely your training images are to "bleed" into the rest of your model, losing qualities like the ability to generate images of other styles you are not training.  This is about knowledge retention in the model by refeeding it the same data it was originally trained on.  This is a big part of the reason why the original training code on Stable Diffusion was so effective.  It was able to train on a wide variety of data and manages to understand possibly millions of concepts and mix them. 
+
+If you don't care to preserve the model you can skip this and train only on your new data.  For a single subject, aka "fast" or "micro" mode, you can usually get away with putting one character or artstyle in without ruining the model you create. 

 ## Starting training

 An example comand to start training:

-    python main.py --base configs/stable-diffusion/v1-finetune_everydream.yaml -t  --actual_resume sd_v1-5_vae.ckpt -n MyProjectName --gpus 0, --data_root training_samples\MyProject
+    python main.py --base configs/stable-diffusion/v1-finetune_everydream.yaml -t --actual_resume sd_v1-5_vae.ckpt -n MyProjectName --data_root training_samples\MyProject

-In the above, the source training data is expected to be laid out in subfolders of training_samples\MyProject as described in above sections. It will use the first Nvidia GPU in the system, and resume from the checkpoint named "sd_v1-5_vae.ckpt".  "-n MyProjectName" is merely a name for the folder where logs will be written during training, which appear under /logs. 
+In the above, the source training data is expected to be laid out in subfolders of training_samples\MyProject as described in above sections. It will resume from the checkpoint named "sd_v1-5_vae.ckpt" but you can change this to most Stable Diffusion checkpoints (ex. 1.4, 1.5, 1.5 + new vae, WD, or others that people have shared online). Inpainting model is not yet supported.  "-n MyProjectName" is merely a name for the folder where logs will be written during training, which appear under /logs. 
+
+## Managing training runs
+
+Each project is different, but consider carefully reading below to adjust your YAML file that configures your training run.  You can make your own copies of the YAML files for differenet projects then use --config to change which one you use.  I will tend to update the YAMLs in future releases so making your own copy also avoids a collision when you "git pull" a new version.  
 ## Testing

-I strongly recommend attempting to undertrain via the repeats and set max_epoch higher compared to typical dream booth recommendations so you will get a few different ckpts along the course of your training session.  The ckpt files will be dumped to a folder such as _\logs\MyPrject2022-10-25T20-37-40_MyProject_ date stamped to the start of training. There are also test images in the _\logs\images\train_ folder that spit out periodically based on another finetune yaml setting:
+I strongly recommend attempting to undertrain via the repeats and instead tend to set max_epoch higher *compared to typical dream booth recommendations* so you will get a few different ckpts along the course of your training session.  The ckpt files will be dumped to a folder such as "_\logs\MyPrject2022-10-25T20-37-40_MyProject_" date stamped to the start of training. There are also test images in the _\logs\images\train_ folder that spit out periodically based on another finetune yaml setting.

-      callbacks:
-        image_logger:
-        target: main.ImageLogger
-        params:
-            batch_frequency: 300
-
-The images will often not all be fully formed, and are randomly selected based on the last few training images, but it's a good idea to watch those images and learn to understand how they look compared to when you go try your new model out in a normal inference app. 
-
-To continue training on a checkpoint, grab the the desried ckpt file \logs\MyPrject2022-10-25T20-37-40_MyProject\checkpoints and move it back to your base folder and just change the --actual_resume pointer to last.ckpt such as the following:
-
-    python main.py --base configs/stable-diffusion/v1-finetune_everydream.yaml -t  --actual_resume last.ckpt -n MyProjectName --gpus 0, --data_root training_samples\MyProject
+The images will often not all be fully formed, and are randomly selected based on the last few training images, but it's a good idea to watch those images and learn to understand how they look compared to when you go try your new model out in a normal Stable Diffusion inference repo. 

 If you are close, consider lowering repeats!
 ## Finetune yaml adjustments

 Depending on your project, a few settings may be useful to tweak or adjust.  In [Starting Training](#starting_training) I'm using __v1-finetune_everydream.yaml__ here but you can make your own copies if you like with different adjustments and save them for your projects.  It is a good idea to get familar with this file as tweaking can be useful as you train.

-I'll highlight the following settings at the end of the file:
+I'll highlight the following settings at the end of the file: 

    trainer:
      benchmark: True
      max_epochs: 4
      max_steps: 99000

-max_epochs will halt training.  I suggest ending on a clean end of an epoch rather than using a steps limit, so defaults are configured as such.  3-5 epochs will give you a few copies to try. 
+"max_epochs" will halt training.  I suggest ending on a clean end of an epoch rather than using a steps limit, so defaults are configured as such.  3-5 epochs will give you a few copies to try.  If you are unsure how many epochs to run, setting a higher value and lower repeats below will give you more ckpt files to test after training concludes.  You can always [continue training](#resuming_training) if needed.

      train:
        target: ldm.data.every_dream.EveryDreamBatch
        params:
-            set: train
            repeats: 20
+            debug_level: 1

-Above, the repeats defines the number of times each training image is trained on per epoch.  For large scale training with 500+ images per subject you may find just 10-15 repeats with 3-4 epochs.  As you add more and more data you can slowly use lower repeat values.  For very small training sets, try the micro YAML that has higher repeats (50-100).
+Above, the "repeats" defines the number of times each training image is trained on per epoch.  For large scale training with 500+ images per subject you may find just 10-15 repeats with 3-4 epochs.  As you add more and more data you can slowly use lower repeat values.  For very small training sets, try the micro YAML that has higher repeats (40-60) with a few epochs.
+
+debug_level: 1 will show in the console when you have multiple aspect ratio images that are dropped because they cannot be fit in.  

 You are also free to move data in and out of your training_samples/MyProject folder between training sessions.  If you have multiple subjects and your number of images between them is a bit mismatched in number, say, 100 for one and only 60 for another, you can try running one epoch 25 repeats, then remove the character with 100 images and train just the one with the 60 images for another epoch at 5 repeats.  It's best to try to keep the data evenly spread, but sometimes that is diffcult.  You may also find certain characters are harder to train, and need more on their own.  Again, test!  Go generate images between 

@ -117,11 +133,56 @@ You are also free to move data in and out of your training_samples/MyProject fol

 Batch size determine how many images are loaded and trained on in parallel. batch_size 6 will work on a 24GB GPU, 1 will only reduce VRAM use to about 19.5GB.  The batch size will divide the number of steps used as well, but one epoch is still "repeats" number of trainings on each image.  Higher batch sizes are desired to give better generalization as the gradient is calculated across the entire batch.  More images in a batch will also decrease training time by keep your GPU utilization higher.

-I recommend not worrying about step count so much. Focus on epochs and repeats instead. 
+I recommend not worrying about step count so much. Focus on epochs and repeats instead.  Steps are a result of the number of training images you have.
+
+    callbacks:
+      image_logger:
+        target: main.ImageLogger
+        params:
+          batch_frequency: 250
+
+Image logger batch frequency determines how often a test image is placed into the logs folder.  150-300 is recommended.  Lower values produce more images but slow training down a bit. 
+
+    modelcheckpoint:
+      params:
+        every_n_epochs: 1  # produce a ckpt every epoch, leave 1!
+        save_top_k: 4   # save the best N ckpts according to loss, can reduce to save disk space but suggest at LEAST 2, more if you have max_epochs below higher!
+
+
+"every_n_epochs" will make the trainer create a ckpt file at the end of every epoch.  I do not recommend changing this.  If you want checkpoints less frequently, increase your repeats instead.  "save_top_k" will save the "best" N ckpts based on a loss value the trainer is tracking.  If you are training 10 epochs and use save_top_k 4, it will only save the "best" 4, saving some disk space.  *It's possible the last few epochs may not save because they are getting worse over time according to the loss value the trainer calculates as it goes.*  If you want all the ckpts to always be saved you can set save_top_k to 99 or any value over max_epochs
+
+    validation:
+      target: ldm.data.ed_validate.EDValidateBatch
+      params:
+        repeats: 0.4
+
+Repeats for validation adjusts how much of the training set is used for validation.  I've added support to reduce this to a decimal value.  For large training where you only use 5-15 repeats, setting this lower speeds up training but stills allows the trainer to run validation to make sure nothing has broken along the way wasting future compute time if something goes wrong.  You can generally leave this untouched.
+
+## Resuming training
+
+If you find even your best or last ckpt from a training run seems "undertrained" you can cut and paste a trained ckpt from your logs into the root folder and resume by running the trainer again and chnage the --ckpt to point to your file.
+
+    python main.py --base configs/stable-diffusion/v1-finetune_everydream.yaml -t  --actual_resume epoch=03-step=01437.ckpt -n MyProjectName --data_root training_samples\MyProject
+
+or
+
+    python main.py --base configs/stable-diffusion/v1-finetune_everydream.yaml -t  --actual_resume last.ckpt -n MyProjectName --data_root training_samples\MyProject
+
+Note above the "epoch=03-step=01437.ckpt" or "last.ckpt" instead of "sd-v1-4-pruned.ckpt".  The full 11GB ckpt file contains the ema weights, non-ema weights, and optimizer state so resuming will have the full trainer state.
+
+## Pruning
+
+To prune your file down from 11GB to 2GB file use:
+
+    python prune_ckpt.py --ckpt last.ckpt
+
+(where last.ckpt is whatever your trained filename is).  This will remove training state and nonema weights and save a new file called "last-pruned.ckpt" in the root folder and leave the last.ckpt in place in case you need to resume.  
+
+I do not suggest using a pruned 2GB file to resume later training.  If you want to resume training, use the full 11GB file.  You can move your 2GB file to whatever your favorite Stable Diffusion webui is, test it out, and delete all the 11GB files and your log folder once you are satisfied with the results.

 ### Additional notes

-Thanks go to the CompVis team for the original training code, Xaiver Xiao for the DreamBooth implementation and tweaking of trainer configs to stuff it into a 24GB card, and Kane Wallmann for code take image captions from the filenames.
+Thanks go to the CompVis team for the original training code, Xaiver Xiao for the DreamBooth implementation and tweaking of trainer configs to stuff it into a 24GB card, and Kane Wallmann for the first implementation of image caption from the filenames.

 References:

@ -129,9 +190,9 @@ References:

 [Xaiver Xiao's DreamBooth implementation](https://github.com/XavierXiao/Dreambooth-Stable-Diffusion)

-[Kane Wallmann's captioning capability](https://github.com/kanewallmann/Dreambooth-Stable-Diffusion)
+[Kane Wallmann](https://github.com/kanewallmann/Dreambooth-Stable-Diffusion)

 # Troubleshooting

-**Cuda out of memory:**  You should have <600MB used before starting training to use batch size 6.  People have reported issues with Precision X1 running in the background and Microsoft's system tray weather app causing problems.  You can disable hardware acceleration in apps like Discord and VS Code to reduce VRAM use, and close as many Chrome tabs as you can bear. 
+**Cuda out of memory:**  You should have <600MB used before starting training to use batch size 6.  People have reported issues with Precision X1 running in the background and Microsoft's system tray weather widget causing problems.  You can disable hardware acceleration in apps like Discord and VS Code to reduce VRAM use, and close as many Chrome tabs as you can bear.  While using a batch_size of 1 only uses about 19.5GB it will have a significant impact on training speed and quality.

--- a/configs/stable-diffusion/v1-finetune_everydream.yaml
+++ b/configs/stable-diffusion/v1-finetune_everydream.yaml
@ -11,12 +11,11 @@ model:
    cond_stage_key: caption
    image_size: 64
    channels: 4
-    cond_stage_trainable: true   # Note: different from the one we trained before
+    cond_stage_trainable: true
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_factor: 0.18215
    use_ema: False
-    embedding_reg_weight: 0.0
    unfreeze_model: True
    model_lr: 1.0e-6

@ -66,15 +65,14 @@ model:
 data:
  target: main.DataModuleFromConfig
  params:
-    batch_size: 6  # ** MUST EQUAL BATCH SIZE BELOW FOR EveryDreamBatch: PARAMS: BATCH_SIZE **
+    batch_size: 6  
    num_workers: 8
    wrap: falsegit
    train:
      target: ldm.data.every_dream.EveryDreamBatch
      params:
-        repeats: 5   # rough suggestions: 5 with 5000 images, 15 for 1000 images, 50 for 500 images, 70 for <50 images
-        flip_p: 0   # use 0.5 to randomly flip images each repeat, not recommended unless very low training data < 20
-        batch_size: 6  # ** MUST EQUAL BATCH SIZE ABOVE FOR DataModuleFromConfig:  PARAMS: BATCH_SIZE  **
+        repeats: 5   # rough suggestions: 5 with 5000+ images, 15 for 1000 images, use micro yaml for <100
+        debug_level: 1   # 1 to print if images are dropped due to multiple-aspect ratio images
    validation:
      target: ldm.data.ed_validate.EDValidateBatch
      params:
@ -88,15 +86,15 @@ lightning:
  modelcheckpoint:
    params:
      every_n_epochs: 1  # produce a ckpt every epoch, leave 1!
-      save_top_k: 3   # save the best N ckpts
      #every_n_train_steps: 1400 # can only use epoch or train step checkpoints
+      save_top_k: 4   # save the best N ckpts according to loss, can reduce to save disk space but suggest at LEAST 2, more if you have max_epochs below higher!
      save_last: True
      filename: "{epoch:02d}-{step:05d}"
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
-        batch_frequency: 200
+        batch_frequency: 250
        max_images: 16
        increase_log_steps: False

--- a/configs/stable-diffusion/v1-finetune_micro.yaml
+++ b/configs/stable-diffusion/v1-finetune_micro.yaml
@ -73,33 +73,31 @@ model:
 data:
  target: main.DataModuleFromConfig
  params:
-    batch_size: 6  # ** MUST EQUAL BATCH SIZE BELOW FOR TRAIN **
+    batch_size: 6  
    num_workers: 8
    wrap: falsegit
    train:
      target: ldm.data.every_dream.EveryDreamBatch
      params:
-        repeats: 25   # try ~50-100 for micro models with 20-50 training images with 1-2 epochs
+        repeats: 50   # for micro models with 20-50 training images of one subject, try 30-60 repeats with 3-4 epochs (max_epochs below)
        flip_p: 0   # use 0.5 to randomly flip images each repeat, not recommended unless very low training data < 20
-        batch_size: 6   # ** MUST EQUAL BATCH SIZE ABOVE FOR DataModuleFromConfig **
+        debug_level: 1   # 1 to print if images are dropped due to multiple-aspect ratio images
    validation:
      target: ldm.data.ed_validate.EDValidateBatch
      params:
-        repeats: 1
-        batch_size: 1  # don't touch
+        repeats: 3
    test:
      target: ldm.data.ed_validate.EDValidateBatch
      params:
        repeats: 1
-        batch_size: 1  # don't touch

 lightning:
  modelcheckpoint:
    params:
-      every_n_epochs: 1
+      every_n_epochs: 1  # produce a ckpt every epoch, leave 1!
      #every_n_train_steps: 1400 # can only use epoch or train step checkpoints
+      save_top_k: 3  # save the best N ckpts according to loss, can reduce to save disk space but suggest at LEAST 2
      save_last: True
-      save_top_k: 3
      filename: "{epoch:02d}-{step:05d}"
  callbacks:
    image_logger:
@ -111,7 +109,7 @@ lightning:

  trainer:
    benchmark: True
-    max_epochs: 3   # epoch step count will be (total training images) / batch_size * repeats, suggest 1-4 epochs depending on dataset size and repeats
+    max_epochs: 4   # suggest 3-4+ and adjust repeats above, this will give you a few ckpts to test
    max_steps: 99000   # better to end on epochs not steps, especially with >500 images to ensure even distribution, but you can set this if you really want...
    check_val_every_n_epoch: 1
    gpus: 0,
--- a/configs/stable-diffusion/v1-finetune_test.yaml
+++ b/configs/stable-diffusion/v1-finetune_test.yaml
@ -23,10 +23,10 @@ model:
      params:
        warm_up_steps: [ 5 ]
        cycle_lengths: [ 1000 ] # incredibly large number to prevent corner cases
-        verbosity_interval: 100  # how often to print LR updates
+        verbosity_interval: 25  # how often to print LR updates
        f_start: [ 1.e-6 ]
        f_max: [ 1.e-6 ] # 1.
-        f_min: [ 1.e-7 ] # 1.
+        f_min: [ 1.e-8 ] # 1.

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
@ -53,7 +53,7 @@ model:
        ddconfig:
          double_z: true
          z_channels: 4
-          resolution: 384
+          resolution: 512
          in_channels: 3
          out_ch: 3
          ch: 128
@ -74,24 +74,23 @@ model:
 data:
  target: main.DataModuleFromConfig
  params:
-    batch_size: 6  # ** MUST EQUAL BATCH SIZE BELOW FOR EveryDreamBatch: PARAMS: BATCH_SIZE **
-    num_workers: 8
+    batch_size: 6  
+    num_workers: 12
    wrap: falsegit
    train:
      target: ldm.data.every_dream.EveryDreamBatch
      params:
-        repeats: 3
-        flip_p: 0   # use 0.5 to randomly flip images each repeat, not recommended unless very small training data
-        debug_level: 1   # data loader debugging, 1 = show truncated images
-        batch_size: 6   # ** MUST EQUAL BATCH SIZE ABOVE FOR DataModuleFromConfig:  PARAMS: BATCH_SIZE  **
+        repeats: 5
+        flip_p: 0   
+        debug_level: 1  
    validation:
      target: ldm.data.ed_validate.EDValidateBatch
      params:
-        repeats: 1
+        repeats: 0.1
    test:
      target: ldm.data.ed_validate.EDValidateBatch
      params:
-        repeats: 0.3
+        repeats: 0.1

 lightning:
  modelcheckpoint:
@ -111,7 +110,7 @@ lightning:

  trainer:
    benchmark: True
-    max_epochs: 3
+    max_epochs: 10
    max_steps: 99000  # better to end on epochs not steps, especially with >500 images to ensure even distribution, but you can set this if you really want...
    check_val_every_n_epoch: 1
    gpus: 0,
--- a/ldm/data/data_loader.py
+++ b/ldm/data/data_loader.py
@ -1,6 +1,5 @@
 import os
 from PIL import Image
-import gc
 import random
 from ldm.data.image_train_item import ImageTrainItem

@ -24,12 +23,10 @@ class DataLoaderMultiAspect():

        self.__recurse_data_root(self=self, recurse_root=data_root)
        random.Random(seed).shuffle(self.image_paths)
-        prepared_train_data = self.__prescan_images(debug_level, self.image_paths, flip_p)
+        prepared_train_data = self.__prescan_images(debug_level, self.image_paths, flip_p) # ImageTrainItem[]
        self.image_caption_pairs = self.__bucketize_images(prepared_train_data, batch_size=batch_size, debug_level=debug_level)
        print(f" * DLMA Example {self.image_caption_pairs[0]} images")

-        gc.collect()
-
    def get_all_images(self):
        return self.image_caption_pairs

@ -54,7 +51,7 @@ class DataLoaderMultiAspect():
            # else:
            #     identifier = parts[0]

-            identifier = parts[0]
+            identifier = parts[0].split(".")[0]
            
            image = Image.open(pathname)
            width, height = image.size
@ -64,26 +61,24 @@ class DataLoaderMultiAspect():

            image_train_item = ImageTrainItem(image=None, caption=identifier, target_wh=target_wh, pathname=pathname, flip_p=flip_p)

-            # put placeholder image in the list and return meta data
            decorated_image_train_items.append(image_train_item)
        return decorated_image_train_items

    @staticmethod
-    def __bucketize_images(prepared_train_data, batch_size=1, debug_level=0):
+    def __bucketize_images(prepared_train_data: list, batch_size=1, debug_level=0):
        # TODO: this is not terribly efficient but at least linear time
        buckets = {}

        for image_caption_pair in prepared_train_data:
-            image = image_caption_pair.image
-            width, height = image.size
+            target_wh = image_caption_pair.target_wh

-            if (width, height) not in buckets:
-                buckets[(width, height)] = []
-            buckets[(width, height)].append(image_caption_pair) # [image, identifier, target_aspect, closest_aspect_wh[w,h], pathname]
+            if (target_wh[0],target_wh[1]) not in buckets:
+                buckets[(target_wh[0],target_wh[1])] = []
+            buckets[(target_wh[0],target_wh[1])].append(image_caption_pair) 
        
        print(f" ** Number of buckets: {len(buckets)}")

-        if len(buckets) > 1: # don't bother truncating if everything is the same aspect ratio
+        if len(buckets) > 1: 
            for bucket in buckets:
                truncate_count = len(buckets[bucket]) % batch_size
                current_bucket_size = len(buckets[bucket])
@ -99,8 +94,6 @@ class DataLoaderMultiAspect():

    @staticmethod
    def __recurse_data_root(self, recurse_root):
-        i = 0
-
        for f in os.listdir(recurse_root):
            current = os.path.join(recurse_root, f)
            # get file ext
@ -108,7 +101,6 @@ class DataLoaderMultiAspect():
            if os.path.isfile(current):
                ext = os.path.splitext(f)[1]
                if ext in ['.jpg', '.jpeg', '.png', '.bmp', '.webp']:
-                    i += 1
                    self.image_paths.append(current)

        sub_dirs = []
@ -120,23 +112,3 @@ class DataLoaderMultiAspect():

        for dir in sub_dirs:
            self.__recurse_data_root(self=self, recurse_root=dir)
-
-    # @staticmethod
-    # def hydrate_image(self, image_path, target_aspect, closest_aspect_wh):
-    #     image = Image.open(example[4]) # 5 is the path
-    #     print(image)
-    #     width, height = image.size
-    #     image_aspect = width / height
-    #     target_aspect = width / height
-
-    #     if example[3][0] == example[3][1]:
-    #         pass
-    #     if target_aspect < image_aspect:
-    #         crop_width = (width - (width * example[3][0] / example[3][1])) / 2
-    #         image = image.crop((crop_width, 0, width - crop_width, height))
-    #     else:
-    #         crop_height = (height - (width * example[3][1] / example[3][0])) / 2
-    #         image = image.crop((0, crop_height, width, height - crop_height))
-
-    #     example[0] = image.resize((example[3][0], example[3][1]), Image.BICUBIC)
-    #     return example
--- a/ldm/data/ed_validate.py
+++ b/ldm/data/ed_validate.py
@ -4,6 +4,7 @@ from torchvision import transforms
 from ldm.data.data_loader import DataLoaderMultiAspect as dlma
 import math
 import ldm.data.dl_singleton as dls
+
 class EDValidateBatch(Dataset):
    def __init__(self,
                 data_root,
@ -13,48 +14,41 @@ class EDValidateBatch(Dataset):
                 batch_size=1,
                 set='val',
                 ):
-
        self.data_root = data_root
        self.batch_size = batch_size

        if not dls.shared_dataloader:
            print("Creating new dataloader singleton")
-            dls.shared_dataloader = dlma(data_root=data_root, debug_level=debug_level, batch_size=self.batch_size)
+            dls.shared_dataloader = dlma(data_root=data_root, debug_level=debug_level, batch_size=self.batch_size, flip_p=flip_p)
            
-        self.image_caption_pairs = dls.shared_dataloader.get_all_images()
+        self.image_train_items = dls.shared_dataloader.get_all_images()
        
-        self.num_images = len(self.image_caption_pairs)
+        self.num_images = len(self.image_train_items)

        self._length = max(math.trunc(self.num_images * repeats), batch_size) - self.num_images % self.batch_size

        print()
-        print(f" ** Validation Set: {set}, num_images: {self.num_images}, length: {self._length}, repeats: {repeats}, batch_size: {self.batch_size}, ")
-        print(f" ** Validation steps: {self._length / batch_size:.0f}")
+        print(f" ** Validation Set: {set}, steps: {self._length / batch_size:.0f}, repeats: {repeats} ")
        print()

-        self.flip = transforms.RandomHorizontalFlip(p=flip_p)
-
    def __len__(self):
        return self._length

    def __getitem__(self, i):
-        idx = i % len(self.image_caption_pairs)
-        example = self.get_image(self.image_caption_pairs[idx])
+        idx = i % self.num_images
+        image_train_item = self.image_train_items[idx]
+
+        example = self.__get_image_for_trainer(image_train_item)
        return example

-    def get_image(self, image_caption_pair):
+    @staticmethod
+    def __get_image_for_trainer(image_train_item):
        example = {}

-        image = image_caption_pair[0]
+        image_train_tmp = image_train_item.hydrate()

-        if not image.mode == "RGB":
-            image = image.convert("RGB")
-
-        identifier = image_caption_pair[1]
-
-        image = self.flip(image)
-        image = np.array(image).astype(np.uint8)
-        example["image"] = (image / 127.5 - 1.0).astype(np.float32)
-        example["caption"] = identifier
+        example["image"] = image_train_tmp.image
+        example["caption"] = image_train_tmp.caption

        return example
+        
--- a/ldm/data/every_dream.py
+++ b/ldm/data/every_dream.py
@ -4,8 +4,6 @@ from pathlib import Path
 from ldm.data.data_loader import DataLoaderMultiAspect as dlma
 import math
 import ldm.data.dl_singleton as dls
-from PIL import Image
-import gc

 class EveryDreamBatch(Dataset):
    def __init__(self,
@ -16,17 +14,14 @@ class EveryDreamBatch(Dataset):
                 batch_size=1,
                 set='train'
                 ):
-        #print(f"EveryDreamBatch batch size: {batch_size}")
        self.data_root = data_root
        self.batch_size = batch_size
-        self.flip_p = flip_p
        
        if not dls.shared_dataloader:
            print(" * Creating new dataloader singleton")
-            dls.shared_dataloader = dlma(data_root=data_root, debug_level=debug_level, batch_size=self.batch_size, flip_p=self.flip_p)
+            dls.shared_dataloader = dlma(data_root=data_root, debug_level=debug_level, batch_size=self.batch_size, flip_p=flip_p)
        
        self.image_train_items = dls.shared_dataloader.get_all_images()
-        #print(f" * EDB Example {self.image_train_items[0]}")
        
        self.num_images = len(self.image_train_items)

@ -41,29 +36,15 @@ class EveryDreamBatch(Dataset):

    def __getitem__(self, i):
        idx = i % self.num_images
-        #example = self.get_image(self.image_caption_pairs[idx])
        image_train_item = self.image_train_items[idx]
-        #print(f" *** example {example}")
-
-        hydrated_image_train_item = image_train_item.hydrate()
-
-        example = self.get_image_for_trainer(hydrated_image_train_item)
+        example = self.__get_image_for_trainer(image_train_item)
        return example

-    def unload_images_over(self, limit):
-        print(f" ********** Unloading images over limit {limit}")
-        i = 0
-        while i < len(self.image_train_items):
-            print(self.image_train_items[i])            
-            if i > limit:
-                self.image_train_items[i][0] = Image.new(mode='RGB', size=(1, 1))
-            i += 1
-        gc.collect()
-
-    def get_image_for_trainer(self, image_train_item):
+    @staticmethod
+    def __get_image_for_trainer(image_train_item):
        example = {}

-        image_train_tmp = image_train_item.as_formatted()
+        image_train_tmp = image_train_item.hydrate()

        example["image"] = image_train_tmp.image
        example["caption"] = image_train_tmp.caption
--- a/ldm/data/image_train_item.py
+++ b/ldm/data/image_train_item.py
@ -1,30 +1,28 @@

-from PIL import Image
+import PIL
 import numpy as np
 from torchvision import transforms

 class ImageTrainItem(): # [image, identifier, target_aspect, closest_aspect_wh[w,h], pathname]
-    def __init__(self, image: Image, caption: str, target_wh: list, pathname: str, flip_p=0.0):
+    def __init__(self, image: PIL.Image, caption: str, target_wh: list, pathname: str, flip_p=0.0):
        self.caption = caption
        self.target_wh = target_wh
-        #self.target_aspect = target_aspect
        self.pathname = pathname
        self.flip = transforms.RandomHorizontalFlip(p=flip_p)

        if image is None:
-            self.image = Image.new(mode='RGB',size=(1,1))
+            self.image = PIL.Image.new(mode='RGB',size=(1,1))
        else:
            self.image = image
-        #image_train_item.image = image.resize((image_train_item.closest_aspect_wh[0], image_train_item.closest_aspect_wh[1]), Image.BICUBIC)

    def hydrate(self):
-        self.image = self.image.resize(self.target_wh, Image.BICUBIC)
-        
-        if not self.image.mode == "RGB":
-            self.image = self.image.convert("RGB")
+        if type(self.image) is not np.ndarray:
+            self.image = PIL.Image.open(self.pathname).convert('RGB')

-        self.image = self.flip(self.image)
-        self.image = np.array(self.image).astype(np.uint8)
+            self.image = self.image.resize((self.target_wh), PIL.Image.BICUBIC)
+
+            self.image = self.flip(self.image)
+            self.image = np.array(self.image).astype(np.uint8)

        self.image = (self.image / 127.5 - 1.0).astype(np.float32)

--- a/ldm/data/test_batch.py
+++ b/ldm/data/test_batch.py
@ -1,53 +0,0 @@
-# script to test data loader by itself
-# run from training root, edit the data_root manually
-# python ldm/data/test_dl.py
-import every_dream
-import time
-
-
-s = time.perf_counter()
-
-data_root = "r:/everydream-trainer/training_samples/ff7r"
-
-batch_size = 1
-every_dream_batch = every_dream.EveryDreamBatch(data_root=data_root, flip_p=0.0, debug_level=0, batch_size=batch_size, repeats=1)
-
-print(f" *TEST*  batch type: {type(every_dream_batch)}")
-i = 0
-is_next = True
-curr_batch = []
-
-while is_next and i < 30 and i < len(every_dream_batch):
-    try:
-        example = every_dream_batch[i]
-        if example is not None:
-            #print(f"example type: {type(example)}") # dict
-            #print(f"example keys: {example.keys()}") # dict_keys(['image', 'caption'])
-            #print(f"example image type: {type(example['image'])}") # numpy.ndarray
-            if i%batch_size == 0:
-                curr_batch = example['image'].shape
-            img_in_right_batch = curr_batch == example['image'].shape
-            print(f" *TEST*  example image shape: {example['image'].shape} {i%batch_size} {img_in_right_batch}")
-            print(f" *TEST*  example caption: {example['caption']}")
-
-            if not img_in_right_batch:
-                raise Exception("Current image in wrong batch")
-            #print(f"example caption: {example['caption']}") # str
-        else:
-            is_next = False
-        i += 1
-    except IndexError:
-        is_next = False
-        print(f"IndexError: {i}")
-        pass
-    # for idx, batches in every_dream_batch:
-    # print(f"inner example type: {type(batches)}")
-    # print(type(batches))
-    # print(type(batches[0]))
-    # print(dir(batches))
-    #h, w = batches.image.size
-    #print(f"{idx:05d}-{idx%6:02d}EveryDreamBatch image caption pair: w:{w} h:{h} {batches.caption[1]}")
-print(f" *TEST* test cycles: {i}")
-print(f" *TEST* EveryDreamBatch epoch image length: {len(every_dream_batch)}")
-elapsed = time.perf_counter() - s
-print(f"{__file__} executed in {elapsed:5.2f} seconds.")
--- a/ldm/data/test_dl.py
+++ b/ldm/data/test_dl.py
@ -1,18 +0,0 @@
-# script to test data loader by itself
-# run from training root, edit the data_root manually
-# python ldm/data/test_dl.py
-import data_loader
-
-data_root = "r:/everydream-trainer/training_samples/multiaspect"
-
-data_loader = data_loader.DataLoaderMultiAspect(data_root=data_root, repeats=1, seed=555, debug_level=2)
-
-image_caption_pairs = data_loader.get_all_images()
-
-print(f"Loaded {len(image_caption_pairs)} image-caption pairs")
-
-for image_caption_pair in image_caption_pairs:
-    print(image_caption_pair)
-    print(image_caption_pair[1])
-
-print(f"**** Done loading. Loaded {len(image_caption_pairs)} images from data_root: {data_root} ****")
--- a/ldm/data/test_validate.py
+++ b/ldm/data/test_validate.py
@ -1,47 +0,0 @@
-# script to test data loader by itself
-# run from training root, edit the data_root manually
-# python ldm/data/test_dl.py
-import ed_validate
-
-data_root = "r:/everydream-trainer/training_samples/multiaspect4"
-
-batch_size = 6
-ed_val_batch = ed_validate.EDValidateBatch(data_root=data_root, flip_p=0.0, debug_level=0, batch_size=batch_size, repeats=1)
-
-print(f"batch type: {type(ed_val_batch)}")
-i = 0
-is_next = True
-curr_batch = []
-while is_next and i < 84:
-    try:
-        example = ed_val_batch[i]
-        if example is not None:
-            #print(f"example type: {type(example)}") # dict
-            #print(f"example keys: {example.keys()}") # dict_keys(['image', 'caption'])
-            #print(f"example image type: {type(example['image'])}") # numpy.ndarray
-            if i%batch_size == 0:
-                curr_batch = example['image'].shape
-            img_in_right_batch = curr_batch == example['image'].shape
-            print(f"example image shape: {example['image'].shape} {i%batch_size} {img_in_right_batch}") # (256, 256, 3)
-
-            if not img_in_right_batch:
-                raise Exception("Current image in wrong batch")
-            #print(f"example caption: {example['caption']}") # str
-        else:
-            is_next = False
-        i += 1
-    except IndexError:
-        is_next = False
-        print(f"IndexError: {i}")
-        pass
-    # for idx, batches in every_dream_batch:
-    # print(f"inner example type: {type(batches)}")
-    # print(type(batches))
-    # print(type(batches[0]))
-    # print(dir(batches))
-    #h, w = batches.image.size
-    #print(f"{idx:05d}-{idx%6:02d}EveryDreamBatch image caption pair: w:{w} h:{h} {batches.caption[1]}")
-
-ed_val_batch.image_caption_pairs = [image_caption_pair for image_caption_pair in self.image_caption_pairs if image_caption_pair[0].size == aspect_ratio]
-
-print(f"EveryDreamBatch epoch image length: {len(ed_val_batch)}")
--- a/main.py
+++ b/main.py
@ -216,17 +216,25 @@ class DataModuleFromConfig(pl.LightningDataModule):
        self.num_workers = num_workers if num_workers is not None else batch_size * 2
        self.use_worker_init_fn = use_worker_init_fn
        if train is not None:
+            train.params.batch_size = self.batch_size
+            train.params.set = 'train'
            self.dataset_configs["train"] = train
        
        self.train_dataloader = self._train_dataloader
        
        if validation is not None:
+            validation.params.batch_size = self.batch_size
+            validation.params.set = 'val'
+            print(f" ****** validation: {validation}")
            self.dataset_configs["validation"] = validation
            self.val_dataloader = partial(self._val_dataloader, shuffle=shuffle_val_dataloader)
        if test is not None:
+            test.params.batch_size = self.batch_size
+            test.params.set = 'test'
            self.dataset_configs["test"] = test
            self.test_dataloader = partial(self._test_dataloader, shuffle=shuffle_test_loader)
        if predict is not None:
+            predict.params.batch_size = self.batch_size
            self.dataset_configs["predict"] = predict
            self.predict_dataloader = self._predict_dataloader
        self.wrap = wrap
@ -300,7 +308,7 @@ class SetupCallback(Callback):

    def on_keyboard_interrupt(self, trainer, pl_module):
        if trainer.global_rank == 0:
-            print("Summoning checkpoint.")
+            print("Keyboard interrupt. Summoning checkpoint.")
            ckpt_path = os.path.join(self.ckptdir, "last.ckpt")
            trainer.save_checkpoint(ckpt_path)

@ -603,14 +611,13 @@ if __name__ == "__main__":
                "dirpath": ckptdir,
                "filename": "{epoch:03}-{global_step:05}",
                "verbose": True,
-                "save_last": True,
            }
        }

        if hasattr(model, "monitor"):
            print(f"Monitoring {model.monitor} as checkpoint metric.")
            default_modelckpt_cfg["params"]["monitor"] = model.monitor
-            #default_modelckpt_cfg["params"]["save_top_k"] = 3
+            #default_modelckpt_cfg["params"]["save_top_k"] = 3 #moved to yaml

        if "modelcheckpoint" in lightning_config:
            modelckpt_cfg = lightning_config.modelcheckpoint
@ -715,8 +722,9 @@ if __name__ == "__main__":
        def melk(*args, **kwargs):
            # run all checkpoint hooks
            if trainer.global_rank == 0:
-                print("Summoning checkpoint.")
-                ckpt_path = os.path.join(ckptdir, "last.ckpt")
+                last_ckpt_name = "last.ckpt"
+                print(f"Training halted. Summoning checkpoint as {last_ckpt_name}")
+                ckpt_path = os.path.join(ckptdir, last_ckpt_name)
                trainer.save_checkpoint(ckpt_path)


@ -760,5 +768,5 @@ if __name__ == "__main__":
            os.makedirs(os.path.split(dst)[0], exist_ok=True)
            os.rename(logdir, dst)
        if trainer.global_rank == 0:
-            print("Training complete. max_steps or max_epochs, reached or we blew up.")
+            print("Training complete. max_steps or max_epochs reached, or we blew up.")
            print(trainer.profiler.summary())