update readme

This commit is contained in:
Victor Hall 2022-10-30 22:04:03 -04:00
parent 2b0d75792f
commit fac465060d
1 changed files with 1 additions and 7 deletions

View File

@ -2,16 +2,10 @@
Automatic captioning uses Salesforce's BLIP to automatically create a clean sentence structure for captioning input images before training.
This requires an Nvidia GPU, but is not terribly intensive work. It should run fine on something like a 1050 Ti 4GB.
This requires an Nvidia GPU, but is not terribly intensive work. It should run fine on something like a 1050 Ti 4GB.
I suggest using [Birme](https://www.birme.net/?target_width=512&target_height=512&auto_focal=false&image_format=webp&quality_jpeg=95&quality_webp=99) to crop and resize first, but there are various tools out there for this. I strongly suggest making sure to crop well for training! It's best to crop to square first because you do not want to caption things that are later removed by cropping.
Auto-caption is fast and not very resource intensive, but it still uses GPU. You only need an Nvidia GPU with 2GB VRAM to run.
Make sure cuda version of torch and torchvision are installed by activating your environment and running this command:
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
## Execute
Place input files into the /input folder