diffusers/examples/textual_inversion/README.md

## Textual Inversion fine-tuning example

[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.

## Running on Colab 

Colab for training 
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)

Colab for inference
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)

## Running locally 
### Installing the dependencies

Before running the scripts, make sure to install the library's training dependencies:

```bash
pip install diffusers"[training]" accelerate "transformers>=4.21.0"
```

And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:

```bash
accelerate config
```


### Cat toy example

You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. 

You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).

Run the following command to authenticate your token

```bash
huggingface-cli login
```

If you have already cloned the repo, then you won't need to go through these steps. 

<br>

Now let's get our dataset.Download 3-4 images from [here](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and save them in a directory. This will be our training data.

And launch the training using

```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATA_DIR="path-to-dir-containing-images"

accelerate launch textual_inversion.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$DATA_DIR \
  --learnable_property="object" \
  --placeholder_token="<cat-toy>" --initializer_token="toy" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=3000 \
  --learning_rate=5.0e-04 --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir="textual_inversion_cat"
```

A full training run takes ~1 hour on one V100 GPU.


### Inference

Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.

```python
from diffusers import StableDiffusionPipeline

model_id = "path-to-your-trained-model"
pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")

prompt = "A <cat-toy> backpack"

image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("cat-backpack.png")
```
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00			`## Textual Inversion fine-tuning example`

			`[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.`
			The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.

Add colab links to textual inversion (#375) 2022-09-06 10:53:02 -06:00			`## Running on Colab`

			`Colab for training`
			`[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)`

			`Colab for inference`
			`[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)`

			`## Running locally`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00			`### Installing the dependencies`

Fix typos and add Typo check GitHub Action (#483) * Fix typos * Add a typo check action * Fix a bug * Changed to manual typo check currently Ref: https://github.com/huggingface/diffusers/pull/483#pullrequestreview-1104468010 Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Removed a confusing message * Renamed "nin_shortcut" to "in_shortcut" * Add memo about NIN Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> 2022-09-16 07:36:51 -06:00			`Before running the scripts, make sure to install the library's training dependencies:`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00
			```bash
[examples] update transfomers version (#665) update transfomrers version in example 2022-09-29 03:16:28 -06:00			`pip install diffusers"[training]" accelerate "transformers>=4.21.0"`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00			```

			`And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:`

			```bash
			`accelerate config`
			```


			`### Cat toy example`

			You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.

			`You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).`

Fix typos and add Typo check GitHub Action (#483) * Fix typos * Add a typo check action * Fix a bug * Changed to manual typo check currently Ref: https://github.com/huggingface/diffusers/pull/483#pullrequestreview-1104468010 Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Removed a confusing message * Renamed "nin_shortcut" to "in_shortcut" * Add memo about NIN Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> 2022-09-16 07:36:51 -06:00			`Run the following command to authenticate your token`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00
			```bash
			`huggingface-cli login`
			```

remove use_auth_token from remaining places (#737) remove use_auth_token 2022-10-05 09:40:49 -06:00			`If you have already cloned the repo, then you won't need to go through these steps.`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00
			`<br>`

			`Now let's get our dataset.Download 3-4 images from [here](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and save them in a directory. This will be our training data.`

			`And launch the training using`

			```bash
			`export MODEL_NAME="CompVis/stable-diffusion-v1-4"`
			`export DATA_DIR="path-to-dir-containing-images"`

			`accelerate launch textual_inversion.py \`
remove use_auth_token from remaining places (#737) remove use_auth_token 2022-10-05 09:40:49 -06:00			`--pretrained_model_name_or_path=$MODEL_NAME \`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00			`--train_data_dir=$DATA_DIR \`
			`--learnable_property="object" \`
			`--placeholder_token="<cat-toy>" --initializer_token="toy" \`
			`--resolution=512 \`
			`--train_batch_size=1 \`
Update README.md 2022-09-02 02:59:27 -06:00			`--gradient_accumulation_steps=4 \`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00			`--max_train_steps=3000 \`
			`--learning_rate=5.0e-04 --scale_lr \`
			`--lr_scheduler="constant" \`
			`--lr_warmup_steps=0 \`
			`--output_dir="textual_inversion_cat"`
			```

			`A full training run takes ~1 hour on one V100 GPU.`


			`### Inference`

			Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.

			```python
			`from diffusers import StableDiffusionPipeline`

			`model_id = "path-to-your-trained-model"`
Score sde ve doc (#400) * initial score_sde_ve docs * fixed typo * fix VE term 2022-09-07 10:34:34 -06:00			`pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00
			`prompt = "A <cat-toy> backpack"`

[Docs] Advertise fp16 instead of autocast (#740) up 2022-10-05 14:20:53 -06:00			`image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]`
Textual inversion (#266) * add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme 2022-09-02 02:53:52 -06:00
			`image.save("cat-backpack.png")`
Update README.md 2022-09-02 02:59:27 -06:00			```