make the execution work

This commit is contained in:
Frederik Fix 2022-09-17 08:55:56 +02:00
parent b664c86e89
commit 7cf4130668
6 changed files with 77 additions and 10 deletions

4
.dockerignore Normal file
View File

@ -0,0 +1,4 @@
./venv
./danbooru-aesthetic
./logs
*.ckpt

3
.gitignore vendored
View File

@ -40,6 +40,7 @@ lib64/
parts/ parts/
sdist/ sdist/
var/ var/
venv/
wheels/ wheels/
share/python-wheels/ share/python-wheels/
*.egg-info/ *.egg-info/
@ -54,4 +55,4 @@ MANIFEST
/src/ /src/
#Obsidian #Obsidian
.obsidian/ .obsidian/

10
Dockerfile Normal file
View File

@ -0,0 +1,10 @@
FROM pytorch/pytorch:latest
RUN apt update && \
apt install -y git curl unzip vim && \
pip install git+https://github.com/derfred/lightning.git@waifu-1.6.0#egg=pytorch-lightning
RUN mkdir /waifu
COPY . /waifu/
WORKDIR /waifu
RUN grep -v pytorch-lightning requirements.txt > requirements-waifu.txt && \
pip install -r requirements-waifu.txt

View File

@ -1,13 +1,51 @@
# 1. Executing # 3. Executing
## Installation There are two modes of executing the training:
1. Using docker image. This is the fastest way to get started.
2. Using system python install. Allows more customization.
Note: You will need to provide the initial checkpoint for resuming the training. This must be a version with the full EMA. Otherwise you will get this error:
```
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
Missing key(s) in state_dict: "model_ema.diffusion_modeltime_embed0weight", "model_ema.diffusion_modeltime_embed0bias".... (Many lines of similar outputs)
```
## 1. Using docker image
An image is provided at `ghcr.io/derfred/waifu-diffusion`. Execute it using by adjusting the NUM_GPU variable:
```
docker run -it -e NUM_GPU=x ghcr.io/derfred/waifu-diffusion
```
Next you will want to download the starting checkpoint into the file `model.ckpt` and copy the training data in the directory `/waifu/danbooru-aesthetic`.
Finally execute the training using:
```
sh train.sh -t -n "aesthetic" --resume_from_checkpoint model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --data_root "./danbooru-aesthetic"
```
## 2. system python install
First install the dependencies: First install the dependencies:
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
``` ```
## Executing Next you will want to download the starting checkpoint into the file `model.ckpt` and copy the training data in the directory `/waifu/danbooru-aesthetic`.
```bash
sh train.sh Also you will need to edit the configuration in `./configs/stable-diffusion/v1-finetune-4gpu.yaml`. In the `data` section (around line 70) change the `batch_size` and `num_workers` to the number of GPUs you are using:
``` ```
data:
target: main.DataModuleFromConfig
params:
batch_size: 4
num_workers: 4
wrap: false
```
Finally execute the training using the following command. You need to adjust the `--gpu` parameter according to your GPU settings.
```bash
sh train.sh -t -n "aesthetic" --resume_from_checkpoint model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --data_root "./danbooru-aesthetic" --gpu=0,1,2,3,
```
In case you get an error stating `KeyError: 'Trying to restore optimizer state but checkpoint contains only the model. This is probably due to ModelCheckpoint.save_weights_only being set to True.'` follow these instructions: https://discord.com/channels/930499730843250783/953132470528798811/1018668937052962908

View File

@ -1,10 +1,10 @@
numpy==1.19.2 numpy==1.21.6
albumentations==0.4.3 albumentations==0.4.3
opencv-python==4.1.2.30 opencv-python
pudb==2019.2 pudb==2019.2
imageio==2.9.0 imageio==2.9.0
imageio-ffmpeg==0.4.2 imageio-ffmpeg==0.4.2
pytorch-lightning==1.6.5 pytorch-lightning==1.6.0
omegaconf==2.1.1 omegaconf==2.1.1
test-tube>=0.7.5 test-tube>=0.7.5
streamlit>=0.73.1 streamlit>=0.73.1

View File

@ -1 +1,15 @@
python3 main.py --train --resume model.ckpt --base ./configs/stable-diffusion/v1-finetune-4gpu.yaml --no-test --seed 25 --scale_lr False --gpus 0,1,2,3 #!/bin/bash
ARGS=""
if [ ! -z "$NUM_GPU" ]; then
ARGS="--gpu="
for i in $(seq 0 $((NUM_GPU-1)))
do
ARGS="$ARGS$i,"
done
sed -i "s/batch_size: 4/batch_size: $NUM_GPU/g" ./configs/stable-diffusion/v1-finetune-4gpu.yaml
sed -i "s/num_workers: 4/num_workers: $NUM_GPU/g" ./configs/stable-diffusion/v1-finetune-4gpu.yaml
fi
python3 main.py $ARGS "$@"