Created EveryDream training (markdown)

Victor Hall 2022-10-19 11:34:22 -04:00
parent 51fae0ae73
commit b78b567e9a
1 changed files with 17 additions and 0 deletions

17
EveryDream-training.md Normal file

@ -0,0 +1,17 @@
### Onward from DreamBooth
While DreamBooth for Stable Diffusion has lead to a lot of success, limits on training multiple concepts required new techniques. Kane Wallmann created a fork of Xaiver Xiao's SD-Dreambooth code that enabled applying individual captions on training and regularization images, and from there much more power has been unlocked.
Upon release I immediately trained several characters simultaneously from the recent video game Final Fantasy 7 Remake, with all of the four main characters represented simply captioned with their full names. From the first attempt it seemed to work just as well as training a single subject, but distortion in the original model was evident. These early attempts still used regularization images to pair with the concepts of "man" and "woman" for male and female characters as well.
## Onward to fully captioning images
The next logical step was to add individual captions to every training images to fully describe the scene. CLIP offers img2txt, though it will not understand the new concepts such as the characters, and often makes mistakes. Nevertheless, a combination of CLIP interrogation, scripting of replacing "a man" or "a woman" in captions, and a bit of labor in fixing up some duplications and errors can clean these up. Thus was born the FF7R V3 model.
## How far can this go?
Can style also be included along with multiple characters? Well it turns out yes. Adding in pictures of the world itself can be mixed in as well. Adding different districts, buildings, and landscape within a game world can add style transfer in a single training. Because of course it can. Want to draw "Gotham City in the style of Midgar City" (from Final Fantasy)? Just ask for it.
## Forward the ~~Foundation~~ EveryDream
The pending issue is trying to preserve the original capabilities of the model. Dreambooth attempts this via regularization, which is training on a grab-bag of images created by the model itself. While it can work, ground truth images should be better.