Advanced fine tuning tools for vision models

Go to file

Victor Hall 25cc3fdb3b update gpu req readme for auto-caption		2022-10-30 22:01:38 -04:00
demo	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00
doc	update gpu req readme for auto-caption	2022-10-30 22:01:38 -04:00
laion	fix requirements.txt and environment.yaml	2022-10-22 22:17:04 -04:00
scripts	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00
.gitignore	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00
LICENSE	working script for laion search and download	2022-10-18 22:56:38 -04:00
README.MD	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00
activate_venv.bat	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00
create_venv.bat	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00
deactivate_venv.bat	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00
environment.yaml	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00
requirements.txt	big update, adding auto-captioning	2022-10-30 21:59:26 -04:00

README.MD

EveryDream toolkit for fine tuning

This repo will contain tools for data engineering efforts for people interested in taking their fine tuning beyond the initial DreamBooth paper or XavierXiao's original Dreambooth implementation for Stable Diffusion, and may be useful for other projects.

For instance, by using ground truth Laion data mixed in with training data to replace "regularization" images, together with clip-interrogated captioning or original TEXT caption from laion, the final few concepts left of the original DreamBooth paper will have been removed. This is a significant step towards towards full fine tuning capabilities.

Captioned training together with regularization has enabled multi-subject and multi-style training at the same time, and can scale to larger training efforts.

For example, you can download a large scale model for Final Fantasy 7 Remake here: https://huggingface.co/panopstor/ff7r-stable-diffusion and be sure to also follow up on the gist link at the bottom for more information along with links to example output of a multi-model fine tuning.

Since DreamBooth is now fading away in favor of improved techniques, I will call the tecnique of using fully captioned training together with ground truth data "EveryDream" to avoid confusion.

If you are interested in caption training with stable diffusion and general purpose fine tuning, and have a 24GB Nvidia GPU, you can try my trainer fork: https://github.com/victorchall/EveryDream-trainer (currently a bit beta but working)

Join the EveryDream discord here: https://discord.gg/uheqxU6sXN

Tools

Download scrapes using Laion - Web scrapes images off the web using Laion data files.

Auto Captioning - Uses BLIP interrogation to caption images for training.

Install

You can use conda or venv. This was developed on Python 3.10.5 but may work on older newer versions.

One step venv setup:

create_venv.bat

Don't forget to activate every time you open the command prompt later.

activate_venv.bat

To use conda:

conda env create -f environment.yaml

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

git clone https://github.com/salesforce/BLIP scripts/BLIP

conda activate everydream

Or you if you wish to configure your own venv, container/WSL, or Linux:

pip install -r requirements.txt

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

git clone https://github.com/salesforce/BLIP scripts/BLIP

Thanks to the SalesForce team for the BLIP tool. It uses CLIP to produce sane sentences like you would expect to see in alt-text.