Merge pull request #17 from derfred/installfixes

docs for executing
2022-09-17 08:11:30 -07:00 · 2022-09-17 08:11:30 -07:00 · 3020278dd9
parent bbcd956746 7cf4130668
commit 3020278dd9
8 changed files with 92 additions and 12 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -0,0 +1,4 @@
+./venv
+./danbooru-aesthetic
+./logs
+*.ckpt
--- a/.gitignore
+++ b/.gitignore
@ -40,6 +40,7 @@ lib64/
 parts/
 sdist/
 var/
+venv/
 wheels/
 share/python-wheels/
 *.egg-info/
@ -54,4 +55,4 @@ MANIFEST
 /src/

 #Obsidian
-.obsidian/
+.obsidian/
--- a/10
+++ b/10
@ -0,0 +1,10 @@
+FROM pytorch/pytorch:latest
+
+RUN apt update && \
+    apt install -y git curl unzip vim && \
+    pip install git+https://github.com/derfred/lightning.git@waifu-1.6.0#egg=pytorch-lightning
+RUN mkdir /waifu
+COPY . /waifu/
+WORKDIR /waifu
+RUN grep -v pytorch-lightning requirements.txt > requirements-waifu.txt && \
+    pip install -r requirements-waifu.txt
--- a/docs/en/training/README.md
+++ b/docs/en/training/README.md
@ -3,6 +3,6 @@ Training is available with waifu-diffusion. Before starting, we remind you that,
 ## Contents
 1. [Dataset](./dataset.md)
 2. [Configuration](./configuration.md)
-3. Executing
+3. [Executing](./executing.md)
 4. Recommendations
-5. FAQ
+5. FAQ
--- a/docs/en/training/dataset.md
+++ b/docs/en/training/dataset.md
@ -82,11 +82,11 @@ We are also going to download the only the first JSON batch. If you want to trai

 Download the 512px folders from 0000 to 0009 (3.86GB):
 ```bash
-rsync rsync://176.9.41.242:873/danbooru2021/512px/000* ./512px/
+rsync -r rsync://176.9.41.242:873/danbooru2021/512px/000* ./512px/
 ```
 Download the first batch of metadata, posts000000000000.json (800MB):
 ``` shell
-rsync -r rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/
+rsync rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/
 ```
 You should now have two folders named: 512px and metadata.

@ -106,8 +106,8 @@ Once the script has finished, you should have a "danbooru-aesthetic" folder, who
 Next we need to put the extracted data into the format required in the section "Dataset requirements". Run the following commands:
 ``` shell
 mkdir danbooru-aesthetic/img danbooru-aesthetic/txt
-mv danbooru-aesthetic/*.jpg labeled_data/img
-mv danbooru-aesthetic/*.txt labeled_data/txt
+mv danbooru-aesthetic/*.jpg danbooru-aesthetic/img
+mv danbooru-aesthetic/*.txt danbooru-aesthetic/txt
 ```

 In order to reduce size, zip the contents of labeled_data:
--- a/docs/en/training/executing.md
+++ b/docs/en/training/executing.md
@ -0,0 +1,51 @@
+# 3. Executing
+
+There are two modes of executing the training:
+1. Using docker image. This is the fastest way to get started.
+2. Using system python install. Allows more customization.
+
+Note: You will need to provide the initial checkpoint for resuming the training. This must be a version with the full EMA. Otherwise you will get this error:
+```
+RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
+  Missing key(s) in state_dict: "model_ema.diffusion_modeltime_embed0weight", "model_ema.diffusion_modeltime_embed0bias".... (Many lines of similar outputs)
+```
+
+## 1. Using docker image
+
+An image is provided at `ghcr.io/derfred/waifu-diffusion`. Execute it using by adjusting the NUM_GPU variable:
+```
+docker run -it -e NUM_GPU=x ghcr.io/derfred/waifu-diffusion
+```
+
+Next you will want to download the starting checkpoint into the file `model.ckpt` and copy the training data in the directory `/waifu/danbooru-aesthetic`.
+
+Finally execute the training using:
+```
+sh train.sh -t -n "aesthetic" --resume_from_checkpoint model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --data_root "./danbooru-aesthetic"
+```
+
+## 2. system python install
+
+First install the dependencies:
+```bash
+pip install -r requirements.txt
+```
+
+Next you will want to download the starting checkpoint into the file `model.ckpt` and copy the training data in the directory `/waifu/danbooru-aesthetic`.
+
+Also you will need to edit the configuration in `./configs/stable-diffusion/v1-finetune-4gpu.yaml`. In the `data` section (around line 70) change the `batch_size` and `num_workers` to the number of GPUs you are using:
+```
+data:
+  target: main.DataModuleFromConfig
+  params:
+    batch_size: 4
+    num_workers: 4
+    wrap: false
+```
+
+Finally execute the training using the following command. You need to adjust the `--gpu` parameter according to your GPU settings.
+```bash
+sh train.sh -t -n "aesthetic" --resume_from_checkpoint model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --data_root "./danbooru-aesthetic" --gpu=0,1,2,3,
+```
+
+In case you get an error stating `KeyError: 'Trying to restore optimizer state but checkpoint contains only the model. This is probably due to ModelCheckpoint.save_weights_only being set to True.'` follow these instructions: https://discord.com/channels/930499730843250783/953132470528798811/1018668937052962908
--- a/requirements.txt
+++ b/requirements.txt
@ -1,10 +1,10 @@
-numpy==1.19.2
+numpy==1.21.6
 albumentations==0.4.3
-opencv-python==4.1.2.30
+opencv-python
 pudb==2019.2
 imageio==2.9.0
 imageio-ffmpeg==0.4.2
-pytorch-lightning==1.4.2
+pytorch-lightning==1.6.0
 omegaconf==2.1.1
 test-tube>=0.7.5
 streamlit>=0.73.1
@ -14,6 +14,6 @@ transformers==4.19.2
 torchmetrics==0.6.0
 kornia==0.6
 gradio
-git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
+git+https://github.com/illeatmyhat/taming-transformers.git@master#egg=taming-transformers
 git+https://github.com/openai/CLIP.git@main#egg=clip
 git+https://github.com/hlky/k-diffusion-sd#egg=k_diffusion
--- a/train.sh
+++ b/train.sh
@ -1 +1,15 @@
-python3 main.py --resume model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --gpus 0,1,2,3
+#!/bin/bash
+
+ARGS=""
+if [ ! -z "$NUM_GPU" ]; then
+  ARGS="--gpu="
+  for i in $(seq 0 $((NUM_GPU-1)))
+  do
+    ARGS="$ARGS$i,"
+  done
+
+  sed -i "s/batch_size: 4/batch_size: $NUM_GPU/g" ./configs/stable-diffusion/v1-finetune-4gpu.yaml
+  sed -i "s/num_workers: 4/num_workers: $NUM_GPU/g" ./configs/stable-diffusion/v1-finetune-4gpu.yaml
+fi
+
+python3 main.py $ARGS "$@"