big readme update, cruft and dead code removed, reverting some stuff back to xavier og

This commit is contained in:
Victor Hall 2022-10-25 22:45:47 -04:00
parent 09d67853f0
commit f5aba7293d
5 changed files with 81 additions and 59 deletions

View File

@ -1,11 +1,11 @@
# Every Dream trainer for Stable Diffusion
This is a bit of a divergence from other fine tuning methods out there for Stable Diffusion. No more "DreamBooth" stuff like tokens, classes, or regularization, though I thank the DreamBooth training community for sharing information and techniques. Yet, it is time to move on.
This is a bit of a divergence from other fine tuning methods out there for Stable Diffusion. No more "DreamBooth" stuff like tokens, classes, or regularization, though I thank the DreamBooth training community for sharing information and techniques. Yet, it is time to move on to explore more capability in fine tuning.
## Onward to Every Dream
This trainer is focused on enabling fine tuning with new training data plus weaving in original, ground truth images scraped from the web via Laion dataset or other publically available ML image sets. Compared to DreamBooth, concepts such as regularization have been removed, an token/class are no long concepts used, as they have been replaced by per-image captioning for training, more or less equal to how Stable Diffusion was trained itself. This is a shift back to the original training code and methodology for fine tuning for general cases.
This trainer is focused on enabling fine tuning with new training data plus weaving in original, ground truth images scraped from the web via Laion dataset or other publically available ML image sets. Compared to DreamBooth, concepts such as regularization have been removed in favor of adding back ground truth data (ex. Laion), and token/class concepts are removed and replaced by per-image captioning for training, more or less equal to how Stable Diffusion was trained itself. This is a shift back to the original training code and methodology for fine tuning for general cases.
To get the most out of this trainer, you will need to curate a data set to be trained in addition to ground truth images to help preserve the model integrity and character. Luckily, there are additional tools below to help enable that, and will grow over time.
To get the most out of this trainer, you will need to curate a data set to be trained in addition to collect ground truth images to help preserve the model integrity and character. Luckily, there are additional tools below to help enable that, and will grow over time.
## Image Captioning
@ -35,7 +35,7 @@ While you can simply stuff everything, new training and ground truth data all in
/training_samples/MyProject/paintings_laion
/training_samples/MyProject/drawings_laion
In the above example, "/training_samples/MyProject" will be your root folder for the command line. It must be devoid of anything but the subfolders. **The subfolders again are purely for your own organizational purposes, the names of the subfolders do not matter to the trainer.** It's up to you how you want to name or organize subfolders, the only requirement is that you use a single layer of subfolders and the root folder for your project contains nothing but the subfolders. You must not put images directly into /training_samples/MyProject.
In the above example, "/training_samples/MyProject" will be your root folder for the command line. It must be devoid of anything but the subfolders. **The subfolders again are purely for your own organizational purposes, the names of the subfolders do not matter to the trainer.** It's up to you how you want to name or organize subfolders, the only requirement is that you use a single layer of subfolders and the root folder for your project contains nothing but the subfolders. You must not put images or other files directly into /training_samples/MyProject.
Also in the above example, /training_samples/MyProject/man would contain new training images you want to "teach" the model, and the man_laion and man_nvflickr sets would contain images scraped from laion or other original sources (see below for possible sources). It's up to you what you want to include.
@ -51,17 +51,71 @@ Visit [EveryDream Data Engineering Tools](https://github.com/victorchall/EveryDr
I suggest pulling down all the files for this set in particular: [https://huggingface.co/datasets/laion/laion2B-en-aesthetic](https://huggingface.co/datasets/laion/laion2B-en-aesthetic) to use with the web scraper.
There is information is in the EveryDream Data Engineering Tools link above on how to run the web scrape. The webscrape takes zero GPU power, so you can run it locally on any PC with Python before renting GPU power if needed.
There is information is in the EveryDream Data Engineering Tools link above on how to run the web scrape. The webscrape takes zero GPU power, so you can run it locally on any PC with Python before renting GPU power if needed. If you are interested in moving to larger scope projects I recommend investing time to curae your data sets as they can be reused.
The Nvidia Flickr set is also helpful and in a fairly "ready to use" format besides renaming the files: [https://github.com/NVlabs/ffhq-dataset](https://github.com/NVlabs/ffhq-dataset)
For this trainer, I suggest "close up photo of a person" for captioning of this dataset. If you want, you can go further and separate male/female photos and caption them "close up phot of a man" or "..a woman" as you see fit. "a close up of a person" is also acceptable, dropping "photo". You can simply select all in windows, F2 to rename and type "a close up of a person_" **without the quotes but with the underscore** to format the filename captions in a way this trainer can use.
Thanks to Xaiver Xiao for the DreamBooth implementation and tweaking of trainer configs to stuff it into a 24GB card, and Kane Wallmann for code take image captions from the filenames.
## Finetune yaml adjustments
Depending on your project, a few settings may be useful to tweak or adjust. In [Starting Training](#starting_training) I'm using __v1-finetune_everydream.yaml__ here but you can make your own copies if you like with different adjustments and save them for your projects. It is a good idea to get familar with this file as tweaking can be useful as you train.
I'll highlight the following settings at the end of the file:
trainer:
benchmark: True
max_epochs: 2
max_steps: 99000
max_epochs will halt training. I suggest ending on a clean end of an epoch rather than using a steps limit, so defaults are configured as such. 2 epochs is not a lot but it is a good point to check before continuing as you can always continue training but you can't go back if you overdo it.
train:
target: ldm.data.every_dream.EveryDreamBatch
params:
size: 512
set: train
repeats: 20
Above, the repeats defines the number of times each training image is trained on per epoch. This is mainly a control to balance against validation. For large scale training with 100+ images per subject you may find just 20 repeats with 1 epoch or 10 repeats with 2 epochs is a good place to stop and check your outputs by loading your file into an inference repo.
The only difference between 20 repeats 1 epoch and 10 repeats 2 epochs is the later will run validation twice (always once per epoch), which costs some extra steps and time. Once you develop a "feel" for your projects you may adjust increase repeats on your first training off a base model to save a bit of time on the validation steps, test, then continue. You may which to, for example, doing 1 epoch at 20 repeats, check, then do one more epoch at 5 repeats if you feel it is "close" to done.
The above settings are a good place at least for humanoid subjects with 100+ images per subject, though some users may find less humanoid subjects require more training, such as cartoons, creatures, etc.
You are also free to move data in and out between training sessions. If you have multiple subjects and your number of images between them is a bit mismatched in number, say, 100 for one and only 60 for another, you can try running one epoch 25 repeats, then remove the character with 100 images and train just the one with the 60 images for another epoch at 5 repeats.
## Starting training
An example comand to start training:
python main.py --base configs/stable-diffusion/v1-finetune_everydream.yaml -t --actual_resume sd_v1-5_vae.ckpt -n MyProjectName --gpus 0, --data_root training_samples\MyProject
In the above, the source training data is expected to be laid out in subfolders of training_samples\MyProject as described in above sections. It will use the first Nvidia GPU in the system, and resume from the checkpoint named "sd_v1-5_vae.ckpt". "-n MyProjectName" is merely a name for the folder where logs will be written during training, which appear under /logs.
## Testing
I strongly recommend attempting to undertrain via the repeats and max_epochs above to test your model before continuing. Try one epoch at 10, then grab the ckpt file from the log folder. The ckpt will be dumped to a folder such as \logs\MyPrject2022-10-25T20-37-40_MyProject date stamped to the start of training. There are also test images in the \logs\images\train folder that spit out periodically based on another finetune yaml setting:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 300
The images will often not all be fully formed, and are randomly selected based on the last few training images, but it's a good idea to start learning what to watch for in those images.
To continue training on a checkpoint, grab the last ckpt file \logs\MyPrject2022-10-25T20-37-40_MyProject\checkpoints and move it back to your base folder and just change the --actual_resume pointer to last.ckpt such as the following:
python main.py --base configs/stable-diffusion/v1-finetune_everydream.yaml -t --actual_resume last.ckpt -n MyProjectName --gpus 0, --data_root training_samples\MyProject
Again, good idea to think about adjusting your repeats before continuing!
### Additional notes
Thanks go to the CompVis team for the original training code, Xaiver Xiao for the DreamBooth implementation and tweaking of trainer configs to stuff it into a 24GB card, and Kane Wallmann for code take image captions from the filenames.
References:
[Compvis Stable Diffusion](https://github.com/CompVis/stable-diffusion)
[Xaiver Xiao's DreamBooth implementation](https://github.com/XavierXiao/Dreambooth-Stable-Diffusion)
[Kane Wallmann's captioning capability](https://github.com/kanewallmann/Dreambooth-Stable-Diffusion)

View File

@ -1,5 +1,5 @@
model:
base_learning_rate: 1.0e-06
base_learning_rate: 1.0e-07
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
reg_weight: 1.0
@ -19,16 +19,7 @@ model:
use_ema: False
embedding_reg_weight: 0.0
unfreeze_model: True
model_lr: 5.0e-7
personalization_config:
target: ldm.modules.embedding_manager.EmbeddingManager
params:
placeholder_strings: ["*"]
initializer_words: ["sculpture"]
per_image_tokens: false
num_vectors_per_token: 1
progressive_words: False
model_lr: 1.0e-7
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
@ -84,7 +75,7 @@ data:
params:
size: 512
set: train
repeats: 3
repeats: 5
validation:
target: ldm.data.personalized.PersonalizedBase
params:
@ -106,7 +97,8 @@ lightning:
trainer:
benchmark: True
max_epochs: 3
#precision: 16 # need lightning 1.6+ ??
#num_nodes: 2 # for multigpu
#check_val_every_n_epoch: 2
max_epochs: 1
max_steps: 99000 # better to end on epochs not steps
#check_val_every_n_epoch: 2 # can skip val every epoch if you want
#precision: 16 # need lightning 1.6+ ?? *WIP*
#num_nodes: 2 # for multigpu *WIP*

View File

@ -21,17 +21,20 @@ class EveryDreamBatch(Dataset):
self.data_root = data_root
self.reg = reg
self.image_paths = []
self.image_classes = []
classes = os.listdir(self.data_root)
print(f"**** Loading data set: data_root: {data_root}, as set: {set}, classes: {classes}")
print(f"**** Loading data set: data_root: {data_root}, as set: {set}")
for cl in classes:
class_path = os.path.join(self.data_root, cl)
for file_path in os.listdir(class_path):
image_path = os.path.join(class_path, file_path)
self.image_paths.append(image_path)
self.image_classes.append(cl)
import random
# improve multi-class training by mixing order of training set, avoid training on one class N times in a row
# if trainer crashes between epochs and you resume at least it isn't heavily biasing early files in dir order
self.image_paths = random.Random(555).shuffle(self.image_paths)
# self._length = len(self.image_paths)
self.num_images = len(self.image_paths)

View File

@ -435,7 +435,7 @@ class LatentDiffusion(DDPM):
def __init__(self,
first_stage_config,
cond_stage_config,
personalization_config,
#personalization_config, TI not used
num_timesteps_cond=None,
cond_stage_key="image",
cond_stage_trainable=False,
@ -1303,7 +1303,7 @@ class LatentDiffusion(DDPM):
return samples, intermediates
@torch.no_grad()
def log_images(self, batch, N=8, n_row=4, sample=True, ddim_steps=50, ddim_eta=1., return_keys=None,
def log_images(self, batch, N=8, n_row=4, sample=True, ddim_steps=40, ddim_eta=1., return_keys=None,
quantize_denoised=True, inpaint=False, plot_denoise_rows=False, plot_progressive_rows=False,
plot_diffusion_rows=False, **kwargs):
@ -1317,22 +1317,6 @@ class LatentDiffusion(DDPM):
bs=N)
N = min(x.shape[0], N)
n_row = min(x.shape[0], n_row)
# log["inputs"] = x
# log["reconstruction"] = xrec
# if self.model.conditioning_key is not None:
# if hasattr(self.cond_stage_model, "decode"):
# xc = self.cond_stage_model.decode(c)
# log["conditioning"] = xc
# elif self.cond_stage_key in ["caption"]:
# xc = log_txt_as_img((x.shape[2], x.shape[3]), batch["caption"])
# log["conditioning"] = xc
# elif self.cond_stage_key == 'class_label':
# xc = log_txt_as_img((x.shape[2], x.shape[3]), batch["human_label"])
# log['conditioning'] = xc
# elif isimage(xc):
# log["conditioning"] = xc
# if ismap(xc):
# log["original_conditioning"] = self.to_rgb(xc)
if plot_diffusion_rows:
# get diffusion row
@ -1372,7 +1356,7 @@ class LatentDiffusion(DDPM):
eta=ddim_eta,
unconditional_guidance_scale=5.0,
unconditional_conditioning=uc)
log["samples_subject"] = self.decode_first_stage(sample_scaled)
log["sample_scaled"] = self.decode_first_stage(sample_scaled)
if quantize_denoised and not isinstance(self.first_stage_model, AutoencoderKL) and not isinstance(
self.first_stage_model, IdentityFirstStage):

19
main.py
View File

@ -149,13 +149,6 @@ def get_parser(**parser_kwargs):
default=True,
help="Prepend the final directory in the data_root to the output directory name")
parser.add_argument(
"--max_training_steps",
type=int,
required=False,
default=35000,
help="Number of iterations to run")
parser.add_argument("--actual_resume",
type=str,
required=True,
@ -555,9 +548,6 @@ if __name__ == "__main__":
# merge trainer cli with config
trainer_config = lightning_config.get("trainer", OmegaConf.create())
# Set the steps
trainer_config.max_steps = opt.max_training_steps
for k in nondefault_trainer_args(opt):
trainer_config[k] = getattr(opt, k)
if not "gpus" in trainer_config:
@ -611,7 +601,7 @@ if __name__ == "__main__":
"target": "pytorch_lightning.callbacks.ModelCheckpoint",
"params": {
"dirpath": ckptdir,
"filename": "{epoch:03}",
"filename": "{epoch:03}-{global_step:05}",
"verbose": True,
"save_last": True,
}
@ -679,7 +669,6 @@ if __name__ == "__main__":
del callbacks_cfg['ignore_keys_callback']
trainer_kwargs["callbacks"] = [instantiate_from_config(callbacks_cfg[k]) for k in callbacks_cfg]
trainer_kwargs["max_steps"] = trainer_opt.max_steps
trainer = Trainer.from_argparse_args(trainer_opt, **trainer_kwargs)
trainer.logdir = logdir ###
@ -725,7 +714,7 @@ if __name__ == "__main__":
def melk(*args, **kwargs):
# run all checkpoint hooks
if trainer.global_rank == 0:
print("Here comes the checkpoint...")
print("Summoning checkpoint.")
ckpt_path = os.path.join(ckptdir, "last.ckpt")
trainer.save_checkpoint(ckpt_path)
@ -770,5 +759,5 @@ if __name__ == "__main__":
os.makedirs(os.path.split(dst)[0], exist_ok=True)
os.rename(logdir, dst)
if trainer.global_rank == 0:
print("Training complete. max_training_steps reached or we blew up.")
# print(trainer.profiler.summary())
print("Training complete. max_steps or max_epochs, reached or we blew up.")
print(trainer.profiler.summary())