From 0e2fd255ba8edf37a9b6bee31c1de291bd5e993d Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Thu, 15 Sep 2022 18:39:05 +0200 Subject: [PATCH 1/8] wrong arg --- docs/en/training/dataset.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/training/dataset.md b/docs/en/training/dataset.md index 4544f35..6d1316a 100644 --- a/docs/en/training/dataset.md +++ b/docs/en/training/dataset.md @@ -82,11 +82,11 @@ We are also going to download the only the first JSON batch. If you want to trai Download the 512px folders from 0000 to 0009 (3.86GB): ```bash -rsync rsync://176.9.41.242:873/danbooru2021/512px/000* ./512px/ +rsync -r rsync://176.9.41.242:873/danbooru2021/512px/000* ./512px/ ``` Download the first batch of metadata, posts000000000000.json (800MB): ``` shell -rsync -r rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/ +rsync rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/ ``` You should now have two folders named: 512px and metadata. From 90a580f68432336c98a20f2b820dfd4b12ae31f5 Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Thu, 15 Sep 2022 18:40:47 +0200 Subject: [PATCH 2/8] Update README.md --- docs/en/training/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/training/README.md b/docs/en/training/README.md index 905f193..4a440ed 100644 --- a/docs/en/training/README.md +++ b/docs/en/training/README.md @@ -3,6 +3,6 @@ Training is available with waifu-diffusion. Before starting, we remind you that, ## Contents 1. [Dataset](./dataset.md) 2. [Configuration](./configuration.md) -3. Executing +3. [Executing](./executing.md) 4. Recommendations -5. FAQ \ No newline at end of file +5. FAQ From 98c6e0c38b14cc0b34741a0738b7d5c032b14f75 Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Thu, 15 Sep 2022 18:42:28 +0200 Subject: [PATCH 3/8] Create executing.md --- docs/en/training/executing.md | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 docs/en/training/executing.md diff --git a/docs/en/training/executing.md b/docs/en/training/executing.md new file mode 100644 index 0000000..9916ad5 --- /dev/null +++ b/docs/en/training/executing.md @@ -0,0 +1,8 @@ +# 1. Executing + +## Installation + +First install the dependencies: +```bash +pip install -r requirements.txt +``` From af00e49aa804658b24e5eddce8b98a037fba54c3 Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Thu, 15 Sep 2022 19:43:15 +0200 Subject: [PATCH 4/8] Update dataset.md --- docs/en/training/dataset.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/training/dataset.md b/docs/en/training/dataset.md index 6d1316a..2067240 100644 --- a/docs/en/training/dataset.md +++ b/docs/en/training/dataset.md @@ -106,8 +106,8 @@ Once the script has finished, you should have a "danbooru-aesthetic" folder, who Next we need to put the extracted data into the format required in the section "Dataset requirements". Run the following commands: ``` shell mkdir danbooru-aesthetic/img danbooru-aesthetic/txt -mv danbooru-aesthetic/*.jpg labeled_data/img -mv danbooru-aesthetic/*.txt labeled_data/txt +mv danbooru-aesthetic/*.jpg danbooru-aesthetic/img +mv danbooru-aesthetic/*.txt danbooru-aesthetic/txt ``` In order to reduce size, zip the contents of labeled_data: From ab67872afa41f432a333a5843054c5014fd39b4f Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Thu, 15 Sep 2022 19:44:08 +0200 Subject: [PATCH 5/8] Update requirements.txt --- requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index 9637d43..090b52d 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,7 +4,7 @@ opencv-python==4.1.2.30 pudb==2019.2 imageio==2.9.0 imageio-ffmpeg==0.4.2 -pytorch-lightning==1.4.2 +pytorch-lightning==1.6.5 omegaconf==2.1.1 test-tube>=0.7.5 streamlit>=0.73.1 @@ -14,6 +14,6 @@ transformers==4.19.2 torchmetrics==0.6.0 kornia==0.6 gradio -git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers +git+https://github.com/illeatmyhat/taming-transformers.git@master#egg=taming-transformers git+https://github.com/openai/CLIP.git@main#egg=clip git+https://github.com/hlky/k-diffusion-sd#egg=k_diffusion From 0cfa39a37359e500cbaa699ec15ebfe8f411238f Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Thu, 15 Sep 2022 19:46:52 +0200 Subject: [PATCH 6/8] Update train.sh --- train.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/train.sh b/train.sh index b35545b..13144e9 100644 --- a/train.sh +++ b/train.sh @@ -1 +1 @@ -python3 main.py --resume model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --gpus 0,1,2,3 +python3 main.py --train --resume model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --gpus 0,1,2,3 From b664c86e899399355ac32c3dc832aa771186544c Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Thu, 15 Sep 2022 19:51:56 +0200 Subject: [PATCH 7/8] Update executing.md --- docs/en/training/executing.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/en/training/executing.md b/docs/en/training/executing.md index 9916ad5..7ccb57f 100644 --- a/docs/en/training/executing.md +++ b/docs/en/training/executing.md @@ -6,3 +6,8 @@ First install the dependencies: ```bash pip install -r requirements.txt ``` + +## Executing +```bash +sh train.sh +``` From 7cf41306680514dc68ffed21374b65712360a42e Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Sat, 17 Sep 2022 08:55:56 +0200 Subject: [PATCH 8/8] make the execution work --- .dockerignore | 4 +++ .gitignore | 3 ++- Dockerfile | 10 ++++++++ docs/en/training/executing.md | 48 +++++++++++++++++++++++++++++++---- requirements.txt | 6 ++--- train.sh | 16 +++++++++++- 6 files changed, 77 insertions(+), 10 deletions(-) create mode 100644 .dockerignore create mode 100644 Dockerfile diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 0000000..c60f026 --- /dev/null +++ b/.dockerignore @@ -0,0 +1,4 @@ +./venv +./danbooru-aesthetic +./logs +*.ckpt diff --git a/.gitignore b/.gitignore index 89d4032..680b96c 100644 --- a/.gitignore +++ b/.gitignore @@ -40,6 +40,7 @@ lib64/ parts/ sdist/ var/ +venv/ wheels/ share/python-wheels/ *.egg-info/ @@ -54,4 +55,4 @@ MANIFEST /src/ #Obsidian -.obsidian/ \ No newline at end of file +.obsidian/ diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..8ad4af7 --- /dev/null +++ b/Dockerfile @@ -0,0 +1,10 @@ +FROM pytorch/pytorch:latest + +RUN apt update && \ + apt install -y git curl unzip vim && \ + pip install git+https://github.com/derfred/lightning.git@waifu-1.6.0#egg=pytorch-lightning +RUN mkdir /waifu +COPY . /waifu/ +WORKDIR /waifu +RUN grep -v pytorch-lightning requirements.txt > requirements-waifu.txt && \ + pip install -r requirements-waifu.txt diff --git a/docs/en/training/executing.md b/docs/en/training/executing.md index 7ccb57f..4bfafac 100644 --- a/docs/en/training/executing.md +++ b/docs/en/training/executing.md @@ -1,13 +1,51 @@ -# 1. Executing +# 3. Executing -## Installation +There are two modes of executing the training: +1. Using docker image. This is the fastest way to get started. +2. Using system python install. Allows more customization. + +Note: You will need to provide the initial checkpoint for resuming the training. This must be a version with the full EMA. Otherwise you will get this error: +``` +RuntimeError: Error(s) in loading state_dict for LatentDiffusion: + Missing key(s) in state_dict: "model_ema.diffusion_modeltime_embed0weight", "model_ema.diffusion_modeltime_embed0bias".... (Many lines of similar outputs) +``` + +## 1. Using docker image + +An image is provided at `ghcr.io/derfred/waifu-diffusion`. Execute it using by adjusting the NUM_GPU variable: +``` +docker run -it -e NUM_GPU=x ghcr.io/derfred/waifu-diffusion +``` + +Next you will want to download the starting checkpoint into the file `model.ckpt` and copy the training data in the directory `/waifu/danbooru-aesthetic`. + +Finally execute the training using: +``` +sh train.sh -t -n "aesthetic" --resume_from_checkpoint model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --data_root "./danbooru-aesthetic" +``` + +## 2. system python install First install the dependencies: ```bash pip install -r requirements.txt ``` -## Executing -```bash -sh train.sh +Next you will want to download the starting checkpoint into the file `model.ckpt` and copy the training data in the directory `/waifu/danbooru-aesthetic`. + +Also you will need to edit the configuration in `./configs/stable-diffusion/v1-finetune-4gpu.yaml`. In the `data` section (around line 70) change the `batch_size` and `num_workers` to the number of GPUs you are using: ``` +data: + target: main.DataModuleFromConfig + params: + batch_size: 4 + num_workers: 4 + wrap: false +``` + +Finally execute the training using the following command. You need to adjust the `--gpu` parameter according to your GPU settings. +```bash +sh train.sh -t -n "aesthetic" --resume_from_checkpoint model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --data_root "./danbooru-aesthetic" --gpu=0,1,2,3, +``` + +In case you get an error stating `KeyError: 'Trying to restore optimizer state but checkpoint contains only the model. This is probably due to ModelCheckpoint.save_weights_only being set to True.'` follow these instructions: https://discord.com/channels/930499730843250783/953132470528798811/1018668937052962908 diff --git a/requirements.txt b/requirements.txt index 090b52d..5e1c8ab 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,10 +1,10 @@ -numpy==1.19.2 +numpy==1.21.6 albumentations==0.4.3 -opencv-python==4.1.2.30 +opencv-python pudb==2019.2 imageio==2.9.0 imageio-ffmpeg==0.4.2 -pytorch-lightning==1.6.5 +pytorch-lightning==1.6.0 omegaconf==2.1.1 test-tube>=0.7.5 streamlit>=0.73.1 diff --git a/train.sh b/train.sh index 13144e9..32314a4 100644 --- a/train.sh +++ b/train.sh @@ -1 +1,15 @@ -python3 main.py --train --resume model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --gpus 0,1,2,3 +#!/bin/bash + +ARGS="" +if [ ! -z "$NUM_GPU" ]; then + ARGS="--gpu=" + for i in $(seq 0 $((NUM_GPU-1))) + do + ARGS="$ARGS$i," + done + + sed -i "s/batch_size: 4/batch_size: $NUM_GPU/g" ./configs/stable-diffusion/v1-finetune-4gpu.yaml + sed -i "s/num_workers: 4/num_workers: $NUM_GPU/g" ./configs/stable-diffusion/v1-finetune-4gpu.yaml +fi + +python3 main.py $ARGS "$@"