diffusers

History

apolinario 8aac1f99d7 v1-5 docs updates (#921 ) * Update README.md Additionally add FLAX so the model card can be slimmer and point to this page * Find and replace all * v-1-5 -> v1-5 * revert test changes * Update README.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update docs/source/quicktour.mdx Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update README.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/quicktour.mdx Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update README.md Co-authored-by: Suraj Patil <surajp815@gmail.com> * Revert certain references to v1-5 * Docs changes * Apply suggestions from code review Co-authored-by: apolinario <joaopaulo.passos+multimodal@gmail.com> Co-authored-by: anton-l <anton@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com>		2022-10-24 22:50:23 +02:00
..
README.md	v1-5 docs updates (#921 )	2022-10-24 22:50:23 +02:00
requirements.txt	[examples] update transfomers version (#665 )	2022-09-29 11:16:28 +02:00
textual_inversion.py	Fix push_to_hub for dreambooth and textual_inversion (#748 )	2022-10-07 11:50:28 +02:00

README.md

Textual Inversion fine-tuning example

Textual inversion is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. The textual_inversion.py script shows how to implement the training procedure and adapt it for stable diffusion.

Running on Colab

Colab for training

Colab for inference

Running locally

Installing the dependencies

Before running the scripts, make sure to install the library's training dependencies:

pip install diffusers"[training]" accelerate "transformers>=4.21.0"

And initialize an 🤗Accelerate environment with:

accelerate config

Cat toy example

You need to accept the model license before downloading or using the weights. In this example we'll use model version v1-4, so you'll need to visit its card, read the license and tick the checkbox if you agree.

You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to this section of the documentation.

Run the following command to authenticate your token

huggingface-cli login

If you have already cloned the repo, then you won't need to go through these steps.

Now let's get our dataset.Download 3-4 images from here and save them in a directory. This will be our training data.

And launch the training using

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR="path-to-dir-containing-images"

accelerate launch textual_inversion.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$DATA_DIR \
  --learnable_property="object" \
  --placeholder_token="<cat-toy>" --initializer_token="toy" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=3000 \
  --learning_rate=5.0e-04 --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir="textual_inversion_cat"

A full training run takes ~1 hour on one V100 GPU.

Inference

Once you have trained a model using above command, the inference can be done simply using the StableDiffusionPipeline. Make sure to include the placeholder_token in your prompt.

from diffusers import StableDiffusionPipeline

model_id = "path-to-your-trained-model"
pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")

prompt = "A <cat-toy> backpack"

image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("cat-backpack.png")