diffusers/examples/dreambooth/README.md

# DreamBooth training example

[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text2image models like stable diffusion given just a few(3~5) images of a subject.
The `train_dreambooth.py` script shows how to implement the training procedure and adapt it for stable diffusion.


## Running locally 
### Installing the dependencies

Before running the scripts, make sure to install the library's training dependencies:

```bash
pip install diffusers[training] accelerate transformers
```

And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:

```bash
accelerate config
```

### Dog toy example

You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. 

You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).

Run the following command to authenticate your token

```bash
huggingface-cli login
```

If you have already cloned the repo, then you won't need to go through these steps. You can simple remove the `--use_auth_token` arg from the following command.

<br>

Now let's get our dataset. Download images from [here](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ) and save them in a directory. This will be our training data.

And launch the training using

```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path-to-instance-images"
export OUTPUT_DIR="path-to-save-model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=400
```

### Training with prior-preservation loss

Prior-preservation is used to avoid overfitting and language-drift. Refer to the paper to learn more about it. For prior-preservation we first generate images using the model with a class prompt and then use those during training along with our data.
According to the paper, it's recommened to generate `num_epochs * num_samples` images for prior-preservation. 200-300 works well for most cases.

```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800
```

### Training on a 16GB GPU:

With the help of gradient checkpointing and the 8-bit optimizer from bitsandbytes it's possible to run train dreambooth on a 16GB GPU.

Install `bitsandbytes` with `pip install bitsandbytes`

```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=2 --gradient_checkpointing \
  --use_8bit_adam \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800
```


## Inference

Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `identifier`(e.g. sks in above example) in your prompt.

```python

from torch import autocast
from diffusers import StableDiffusionPipeline
import torch

model_id = "path-to-your-trained-model"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A photo of sks dog in a bucket"

with autocast("cuda"):
    image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("dog-bucket.png")
```
Add training example for DreamBooth. (#554) * Add training example for DreamBooth. * Fix bugs. * Update readme and default hyperparameters. * Reformatting code with black. * Update for multi-gpu trianing. * Apply suggestions from code review * improgve sampling * fix autocast * improve sampling more * fix saving * actuallu fix saving * fix saving * improve dataset * fix collate fun * fix collate_fn * fix collate fn * fix key name * fix dataset * fix collate fn * concat batch in collate fn * add grad ckpt * add option for 8bit adam * do two forward passes for prior preservation * Revert "do two forward passes for prior preservation" This reverts commit 661ca4677e6dccc4ad596c2ee6ca4baad4159e95. * add option for prior_loss_weight * add option for clip grad norm * add more comments * update readme * update readme * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * add docstr for dataset * update the saving logic * Update examples/dreambooth/README.md * remove unused imports Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> 2022-09-27 07:01:18 -06:00			`# DreamBooth training example`

			`[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text2image models like stable diffusion given just a few(3~5) images of a subject.`
			The `train_dreambooth.py` script shows how to implement the training procedure and adapt it for stable diffusion.


			`## Running locally`
			`### Installing the dependencies`

			`Before running the scripts, make sure to install the library's training dependencies:`

			```bash
			`pip install diffusers[training] accelerate transformers`
			```

			`And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:`

			```bash
			`accelerate config`
			```

			`### Dog toy example`

			You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.

			`You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).`

			`Run the following command to authenticate your token`

			```bash
			`huggingface-cli login`
			```

			If you have already cloned the repo, then you won't need to go through these steps. You can simple remove the `--use_auth_token` arg from the following command.

			`<br>`

			`Now let's get our dataset. Download images from [here](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ) and save them in a directory. This will be our training data.`

			`And launch the training using`

			```bash
			`export MODEL_NAME="CompVis/stable-diffusion-v1-4"`
			`export INSTANCE_DIR="path-to-instance-images"`
			`export OUTPUT_DIR="path-to-save-model"`

			`accelerate launch train_dreambooth.py \`
			`--pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \`
			`--instance_data_dir=$INSTANCE_DIR \`
			`--output_dir=$OUTPUT_DIR \`
			`--instance_prompt="a photo of sks dog" \`
			`--resolution=512 \`
			`--train_batch_size=1 \`
			`--gradient_accumulation_steps=1 \`
			`--learning_rate=5e-6 \`
			`--lr_scheduler="constant" \`
			`--lr_warmup_steps=0 \`
			`--max_train_steps=400`
			```

			`### Training with prior-preservation loss`

			`Prior-preservation is used to avoid overfitting and language-drift. Refer to the paper to learn more about it. For prior-preservation we first generate images using the model with a class prompt and then use those during training along with our data.`
			According to the paper, it's recommened to generate `num_epochs * num_samples` images for prior-preservation. 200-300 works well for most cases.

			```bash
			`export MODEL_NAME="CompVis/stable-diffusion-v1-4"`
			`export INSTANCE_DIR="path-to-instance-images"`
			`export CLASS_DIR="path-to-class-images"`
			`export OUTPUT_DIR="path-to-save-model"`

			`accelerate launch train_dreambooth.py \`
			`--pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \`
			`--instance_data_dir=$INSTANCE_DIR \`
			`--class_data_dir=$CLASS_DIR \`
			`--output_dir=$OUTPUT_DIR \`
			`--with_prior_preservation --prior_loss_weight=1.0 \`
			`--instance_prompt="a photo of sks dog" \`
			`--class_prompt="a photo of dog" \`
			`--resolution=512 \`
			`--train_batch_size=1 \`
			`--gradient_accumulation_steps=1 \`
			`--learning_rate=5e-6 \`
			`--lr_scheduler="constant" \`
			`--lr_warmup_steps=0 \`
			`--num_class_images=200 \`
			`--max_train_steps=800`
			```

			`### Training on a 16GB GPU:`

			`With the help of gradient checkpointing and the 8-bit optimizer from bitsandbytes it's possible to run train dreambooth on a 16GB GPU.`

			Install `bitsandbytes` with `pip install bitsandbytes`

			```bash
			`export MODEL_NAME="CompVis/stable-diffusion-v1-4"`
			`export INSTANCE_DIR="path-to-instance-images"`
			`export CLASS_DIR="path-to-class-images"`
			`export OUTPUT_DIR="path-to-save-model"`

			`accelerate launch train_dreambooth.py \`
			`--pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \`
			`--instance_data_dir=$INSTANCE_DIR \`
			`--class_data_dir=$CLASS_DIR \`
			`--output_dir=$OUTPUT_DIR \`
			`--with_prior_preservation --prior_loss_weight=1.0 \`
			`--instance_prompt="a photo of sks dog" \`
			`--class_prompt="a photo of dog" \`
			`--resolution=512 \`
			`--train_batch_size=1 \`
			`--gradient_accumulation_steps=2 --gradient_checkpointing \`
			`--use_8bit_adam \`
			`--learning_rate=5e-6 \`
			`--lr_scheduler="constant" \`
			`--lr_warmup_steps=0 \`
			`--num_class_images=200 \`
			`--max_train_steps=800`
			```


			`## Inference`

			Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `identifier`(e.g. sks in above example) in your prompt.

			```python

			`from torch import autocast`
			`from diffusers import StableDiffusionPipeline`
			`import torch`

			`model_id = "path-to-your-trained-model"`
			`pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")`

			`prompt = "A photo of sks dog in a bucket"`

			`with autocast("cuda"):`
			`image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]`

			`image.save("dog-bucket.png")`
			```