EveryDream2trainer/doc/TRAINING.md

86 lines
3.6 KiB
Markdown
Raw Normal View History

2022-12-17 23:21:35 -07:00
# Starting a training session
2022-12-17 20:32:48 -07:00
Here are some example commands to get you started, you can copy paste them into your command line and press enter.
Make sure the last line does not have ^ but all other lines do.
2022-12-17 23:21:35 -07:00
**First, open a command line, then make sure to activate the environment:**
activate_venv.bat
You should see your command line show ```(venv)``` at the beginning of the line. If you don't, something went wrong with setup.
2023-01-06 14:36:51 -07:00
## Running from a json config file
You can edit the example `train.json` file to your liking, then run the following command:
python train.py --config train.json
Be careful with editing the json file, as any syntax errors will cause the program to crash. You might want to use a json validator to check your file before running it. You can use an online validator such as https://jsonlint.com/ or look at it in VS Code.
2023-01-10 19:10:00 -07:00
One particular note is if your path to `data_root` or `resume_ckpt` has backslashes they need to use double \\\ or single /. There is an example train.json in the repo root.
2023-01-10 19:09:23 -07:00
2023-01-06 14:36:51 -07:00
## Running from the command line with arguments
I recommend you copy one of the examples below and keep it in a text file for future reference. Your settings are logged in the logs folder, but you'll need to make a command to start training.
2022-12-17 23:21:35 -07:00
2022-12-17 20:32:48 -07:00
Training examples:
2023-01-28 16:20:04 -07:00
Resuming from a checkpoint, 50 epochs, 6 batch size, 3e-6 learning rate, constant scheduler, generate samples evern 200 steps, 10 minute checkpoint interval, adam8bit, and using the default "input" folder for training data:
2022-12-17 20:32:48 -07:00
python train.py --resume_ckpt "sd_v1-5_vae" ^
--max_epochs 50 ^
2022-12-18 13:48:36 -07:00
--data_root "input" ^
2023-01-28 16:20:04 -07:00
--lr_scheduler constant ^
2022-12-17 20:32:48 -07:00
--project_name myproj ^
--batch_size 6 ^
--sample_steps 200 ^
--lr 3e-6 ^
--ckpt_every_n_minutes 10 ^
--useadam8bit
2022-12-17 20:32:48 -07:00
Training from SD2 512 base model, 18 epochs, 4 batch size, 1.2e-6 learning rate, constant LR, generate samples evern 100 steps, 30 minute checkpoint interval, adam8bit, using imagesin the x:\mydata folder, training at resolution class of 640:
python train.py --resume_ckpt "512-base-ema" ^
--data_root "x:\mydata" ^
--max_epochs 18 ^
--lr_scheduler constant ^
--project_name myproj ^
--batch_size 4 ^
--sample_steps 100 ^
--lr 1.2e-6 ^
--resolution 640 ^
--clip_grad_norm 1 ^
--ckpt_every_n_minutes 30 ^
--useadam8bit
Training from the "SD21" model on the "jets" dataset on another drive, for 50 epochs, 6 batch size, 1.5e-6 learning rate, cosine scheduler that will decay in 1500 steps, generate samples evern 100 steps, save a checkpoint every 20 epochs, and use AdamW 8bit optimizer:
2022-12-17 23:21:35 -07:00
2022-12-17 20:32:48 -07:00
python train.py --resume_ckpt "SD21" ^
--data_root "R:\everydream-trainer\training_samples\mega\gt\objects\jets" ^
2022-12-17 23:21:35 -07:00
--max_epochs 25 ^
2022-12-17 20:32:48 -07:00
--lr_scheduler cosine ^
--lr_decay_steps 1500 ^
--lr_warmup_steps 20 ^
--project_name myproj ^
--batch_size 6 ^
--sample_steps 100 ^
--lr 1.5e-6 ^
--save_every_n_epochs 20 ^
2022-12-17 20:59:03 -07:00
--useadam8bit
2022-12-17 20:32:48 -07:00
Copy paste the above to your command line and press enter.
Make sure the last line does not have ^ but all other lines do. If you want you can put the command all on one line and not use the ^ carats instead.
2022-12-17 20:32:48 -07:00
2022-12-17 20:59:03 -07:00
## How to resume
2022-12-17 20:32:48 -07:00
Point your resume_ckpt to the path in logs like so:
2022-12-17 20:59:03 -07:00
```--resume_ckpt "R:\everydream2trainer\logs\myproj20221213-161620\ckpts\myproj-ep22-gs01099" ^```
Or use relative pathing:
```--resume_ckpt "logs\myproj20221213-161620\ckpts\myproj-ep22-gs01099" ^```
2023-01-10 19:09:23 -07:00
You should point to the folder in the logs per above if you want to resume rather than running a conversion back on a 2.0GB or 2.5GB pruned file if possible.