diff --git a/docs/source/en/training/text2image.mdx b/docs/source/en/training/text2image.mdx index 81dbfba9..851be61b 100644 --- a/docs/source/en/training/text2image.mdx +++ b/docs/source/en/training/text2image.mdx @@ -74,25 +74,13 @@ To load a checkpoint to resume training, pass the argument `--resume_from_checkp Launch the [PyTorch training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) for a fine-tuning run on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset like this: -```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export dataset_name="lambdalabs/pokemon-blip-captions" - -accelerate launch train_text_to_image.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --dataset_name=$dataset_name \ - --use_ema \ - --resolution=512 --center_crop --random_flip \ - --train_batch_size=1 \ - --gradient_accumulation_steps=4 \ - --gradient_checkpointing \ - --mixed_precision="fp16" \ - --max_train_steps=15000 \ - --learning_rate=1e-05 \ - --max_grad_norm=1 \ - --lr_scheduler="constant" --lr_warmup_steps=0 \ - --output_dir="sd-pokemon-model" -``` + +{"path": "../../../../examples/text_to_image/README.md", +"language": "bash", +"start-after": "accelerate_snippet_start", +"end-before": "accelerate_snippet_end", +"dedent": 0} + To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder). diff --git a/examples/text_to_image/README.md b/examples/text_to_image/README.md index 312ebdac..0c378ffd 100644 --- a/examples/text_to_image/README.md +++ b/examples/text_to_image/README.md @@ -52,7 +52,7 @@ If you have already cloned the repo, then you won't need to go through these ste With `gradient_checkpointing` and `mixed_precision` it should be possible to fine tune the model on a single 24GB GPU. For higher `batch_size` and faster training it's better to use GPUs with >30GB memory. **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** - + ```bash export MODEL_NAME="CompVis/stable-diffusion-v1-4" export dataset_name="lambdalabs/pokemon-blip-captions" @@ -71,6 +71,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model" ``` + To run on your own training files prepare the dataset according to the format required by `datasets`, you can find the instructions for how to do that in this [document](https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder-with-metadata).