EveryDream-trainer/README.md

# Dreambooth on Stable Diffusion

This is an implementtaion of Google's [Dreambooth](https://arxiv.org/abs/2208.12242) with [Stable Diffusion](https://github.com/CompVis/stable-diffusion). The original Dreambooth is based on [Imagen](https://imagen.research.google/) text-to-image model. However, neither the model nor the pre-trained weights of Imagen is available. To enable people to fine-tune a text-to-image model with a few examples, I implemented the idea of Dreambooth on Stable diffusion.

This code repository is based on that of [Textual Inversion](https://github.com/rinongal/textual_inversion). Note that Textual Inversion only optimizes word ebedding, while dreambooth fine-tunes the whole diffusion model.

The implementation makes minimum changes over the official codebase of Textual Inversion. In fact, due to lazyness, some components in Textual Inversion, such as the embedding manager, are not deleted, although they will never be used here.

## Usage

### Preparation
To fine-tune a stable diffusion model, you need to obtain the pre-trained stable diffusion models following their [instructions](https://github.com/CompVis/stable-diffusion#stable-diffusion-v1). Weights can be downloads on [HuggingFace](https://huggingface.co/CompVis). You can decide which version of checkpoint to use, but I use ```sd-v1-4-full-ema.ckpt```.

We also need to create a set of images for regularization, as the fine-tuning algorithm of Dreambooth requires that. Details of the algorithm can be found in the paper. The text prompt can be ```photo of a xxx```, where ```xxx``` is a word that describes the class of your object, such as ```dog```. The command is

```
python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 8 --n_iter 1 --scale 10.0 --ddim_steps 50  --ckpt /path/to/original/stable-diffusion/sd-v1-4-full-ema.ckpt --prompt "a photo of a <xxx>" 
```

I generate 8 images for regularization. After that, save the generated images (separately, one image per ```.png``` file) at ```/root/to/regularization/images```.

### Training
Training can be done by running the following command

```
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml 
                -t 
                --actual_resume /path/to/original/stable-diffusion/sd-v1-4-full-ema.ckpt  
                -n <job name> 
                --gpus 0, 
                --data_root /root/to/training/images 
                --reg_data_root /root/to/regularization/images 
                --class_word <xxx>
```

### Generation
Update README.md 2022-09-06 01:13:09 -06:00			`# Dreambooth on Stable Diffusion`

			`This is an implementtaion of Google's [Dreambooth](https://arxiv.org/abs/2208.12242) with [Stable Diffusion](https://github.com/CompVis/stable-diffusion). The original Dreambooth is based on [Imagen](https://imagen.research.google/) text-to-image model. However, neither the model nor the pre-trained weights of Imagen is available. To enable people to fine-tune a text-to-image model with a few examples, I implemented the idea of Dreambooth on Stable diffusion.`

			`This code repository is based on that of [Textual Inversion](https://github.com/rinongal/textual_inversion). Note that Textual Inversion only optimizes word ebedding, while dreambooth fine-tunes the whole diffusion model.`

Update README.md 2022-09-06 01:25:05 -06:00			`The implementation makes minimum changes over the official codebase of Textual Inversion. In fact, due to lazyness, some components in Textual Inversion, such as the embedding manager, are not deleted, although they will never be used here.`
Update README.md 2022-09-06 01:13:09 -06:00
Update README.md 2022-09-06 01:13:19 -06:00			`## Usage`
Update README.md 2022-09-06 01:14:17 -06:00
			`### Preparation`
Update README.md 2022-09-06 01:25:05 -06:00			To fine-tune a stable diffusion model, you need to obtain the pre-trained stable diffusion models following their [instructions](https://github.com/CompVis/stable-diffusion#stable-diffusion-v1). Weights can be downloads on [HuggingFace](https://huggingface.co/CompVis). You can decide which version of checkpoint to use, but I use ```sd-v1-4-full-ema.ckpt```.

Update README.md 2022-09-06 01:35:02 -06:00			We also need to create a set of images for regularization, as the fine-tuning algorithm of Dreambooth requires that. Details of the algorithm can be found in the paper. The text prompt can be ```photo of a xxx```, where ```xxx``` is a word that describes the class of your object, such as ```dog```. The command is
Update README.md 2022-09-06 01:28:58 -06:00
			```
Update README.md 2022-09-06 01:35:02 -06:00			`python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 8 --n_iter 1 --scale 10.0 --ddim_steps 50 --ckpt /path/to/original/stable-diffusion/sd-v1-4-full-ema.ckpt --prompt "a photo of a <xxx>"`
Update README.md 2022-09-06 01:28:58 -06:00			```

			I generate 8 images for regularization. After that, save the generated images (separately, one image per ```.png``` file) at ```/root/to/regularization/images```.
Update README.md 2022-09-06 01:25:05 -06:00
			`### Training`
Update README.md 2022-09-06 01:35:02 -06:00			`Training can be done by running the following command`

			```
			`python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml`
			`-t`
			`--actual_resume /path/to/original/stable-diffusion/sd-v1-4-full-ema.ckpt`
			`-n <job name>`
			`--gpus 0,`
			`--data_root /root/to/training/images`
			`--reg_data_root /root/to/regularization/images`
			`--class_word <xxx>`
			```
Update README.md 2022-09-06 01:25:05 -06:00
			`### Generation`