update docs

This commit is contained in:
Victor Hall 2023-03-19 00:59:17 -04:00
parent 5afd75fd98
commit a4cdafb63b
5 changed files with 12 additions and 6 deletions

BIN
.github/discord_sm.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.2 KiB

BIN
.github/kofibutton_sm.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

BIN
.github/patreon-medium-button.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.8 KiB

View File

@ -2,9 +2,13 @@
Welcome to v2.0 of EveryDream trainer! Now with more Diffusers, faster, and even more features!
Please join us on Discord! https://discord.gg/uheqxU6sXN
For the most up to date news and community discussions, please join us on Discord!
If you find this tool useful, please consider subscribing to the project on [Patreon](https://www.patreon.com/everydream) or a one-time donation at [Ko-fi](https://ko-fi.com/everydream).
[![Discord!](.github/discord_sm.png)](https://discord.gg/uheqxU6sXN)
If you find this tool useful, please consider subscribing to the project on Patreon or a one-time donation on Ko-fi. Your donations keep this project alive as a free open source tool with ongoing enhancements.
[![Patreon](.github/patreon-medium-button.png)](https://www.patreon.com/everydream) or [![Kofi](.github/kofibutton_sm.png)](https://ko-fi.com/everydream).
If you're coming from Dreambooth, please [read this](doc/NOTDREAMBOOTH.md) for an explanation of why EveryDream is not Dreambooth.
@ -22,9 +26,9 @@ Single GPU is currently supported
32GB of system RAM recommended for 50k+ training images, but may get away with sufficient swap file and 16GB
Ampere or newer 24GB+ (3090/A5000/4090, etc) recommended for 10k+ images unless you want to wait a long time
Ampere or newer 24GB+ (3090/A5000/4090, etc) recommended for 10k+ images
...Or use any computer with a web browser and run on Vast/Runpod/Colab. See [Cloud](#cloud) section below.
...Or use any computer with a web browser and run on Vast/Colab. See [Cloud](#cloud) section below.
## Video tutorials

View File

@ -4,6 +4,8 @@ EveryDream is a *general case fine tuner*. It does not explicitly implement the
That means there is no "class" or "token" or "regularization images". It simply trains image and caption pairs, much more like the original training of Stable Diffusion, just at a much smaller "at home" scale.
For the sake of those experienced in machine learning, foregive me for stretching and demarking some terms, as this is voiced for the typical user coming from Dreambooth training with the vocabulary as commonly used there.
## What is "regularization" and "preservation"?
The Dreambooth technique uses the concept of adding *generated images from the model itself* to try to keep training from "veering too off course" and "damaging" the model while fine tuning a specific subject with just a handful of images. It served the purpose of "preserving" the integrity of the model. Early on in Dreambooth's lifecycle, people would train 5-20 images of their face, and use a few hundred or maybe a thousand "regularization" images along with the 5-20 training images of their new subject. Since then, many people want to scale to larger training, but more on that later...
@ -18,7 +20,7 @@ I instead propose you replace images generated out of SD itself with original "g
### Enter ground truth
"Ground truth" for the sake of this document means real images not generated by AI. It's very easy to get publicly available ML data sets to serve this purpose and replace genarated "regularization" images with real photos or otherwise.
"Ground truth" for the sake of this document means real images not generated by AI. It's very easy to get publicly available ML data sets to serve this purpose and replace generated "regularization" images with real photos or otherwise.
Sources include [FFHQ](https://github.com/NVlabs/ffhq-dataset), [Coco](https://cocodataset.org/#home), and [Laion](https://huggingface.co/datasets/laion/laion2B-en-aesthetic/tree/main). There is a simple scraper to search Laion parquet files in the tools repo, and the Laion dataset was used by [Compvis](https://github.com/CompVis/stable-diffusion#weights) and [Stability.ai](https://github.com/Stability-AI/stablediffusion#news) themselves to train the base model.
@ -30,7 +32,7 @@ Using ground truth images for the general purpose of "presevation" will, instead
"Preservation" images and "training" images have no special distinction in EveryDream. All images are treated the same and the trainer does not know the difference. It is all in how you use them.
Any preservation images still need a caption of some sort. Just "person" may be sufficient, afterall we're just trying to simulate Dreambooth for this example. This can be as easy as selecting all the images, F2 rename, type `person_` (with the underscore) and press enter. Windows will append (x) to every file to make sure the filenames are unique, and EveryDream interprets the underscore as the end of the caption when present in the filename, thus all the images will be read as having a caption of simply `person`, similar to how many people train Dreambooth.
Any preservation images still need a caption of some sort. Just "person" may be sufficient, for the sake of this particular exmaple we're just trying to *simulate* Dreambooth. This can be as easy as selecting all the images, F2 rename, type `person_` (with the underscore) and press enter. Windows will append (x) to every file to make sure the filenames are unique, and EveryDream interprets the underscore as the end of the caption when present in the filename, thus all the images will be read as having a caption of simply `person`, similar to how many people train Dreambooth.
You could also generate "person" regularization images out of any Stable Diffusion inference application or download one of the premade regularization sets, *but I find this is less than ideal*. For small training, regularization or preservation is simply not needed. For longer term training you're much better off mixing in real "ground truth" images into your data instead of generated data. "Ground truth" meaning images not generated from an AI. Training back on generated data will reinforce the errors in the model, like extra limbs, weird fingers, watermarks, etc. Using real ground truth data can actually help improve the model.