diff --git a/AutoCaption.ipynb b/AutoCaption.ipynb index 4cb0059..0497da7 100644 --- a/AutoCaption.ipynb +++ b/AutoCaption.ipynb @@ -1 +1 @@ -{"cells":[{"cell_type":"markdown","metadata":{},"source":["## Please read the documentation here:\n","[Auto Captioning](doc/AUTO_CAPTION.md)\n","\n","This notebook requires a GPU instance, any will do, you don't need anything power. 4GB is fine."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":929,"status":"ok","timestamp":1667184580032,"user":{"displayName":"Victor Hall","userId":"00029068894644207946"},"user_tz":240},"id":"lWGx2LuU8Q_I","outputId":"d0eb4d03-f16d-460b-981d-d5f88447e85e"},"outputs":[{"name":"stdout","output_type":"stream","text":["Cloning into 'EveryDream'...\n","remote: Enumerating objects: 90, done.\u001b[K\n","remote: Counting objects: 100% (90/90), done.\u001b[K\n","remote: Compressing objects: 100% (59/59), done.\u001b[K\n","remote: Total 90 (delta 30), reused 76 (delta 18), pack-reused 0\u001b[K\n","Unpacking objects: 100% (90/90), done.\n"]}],"source":["#download repo\n","!git clone https://github.com/victorchall/EveryDream.git\n","%cd EveryDream"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":4944,"status":"ok","timestamp":1667184754992,"user":{"displayName":"Victor Hall","userId":"00029068894644207946"},"user_tz":240},"id":"RJxfSai-8pkD","outputId":"0ac1b805-62a0-48aa-e0da-ee19503bb3f1"},"outputs":[],"source":["# install requirements\n","!pip install torch=='1.12.1+cu113' 'torchvision==0.13.1+cu113' --extra-index-url https://download.pytorch.org/whl/cu113\n","!pip install pandas>='1.3.5'\n","!git clone https://github.com/salesforce/BLIP scripts/BLIP\n","!pip install timm\n","!pip install fairscale=='0.4.4'\n","!pip install transformers=='4.19.2'\n","!pip install timm"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":383,"status":"ok","timestamp":1667185773878,"user":{"displayName":"Victor Hall","userId":"00029068894644207946"},"user_tz":240},"id":"ruRaJ7Cx9vhw","outputId":"f0701d3e-bfa9-45a9-a742-c3615466aad7"},"outputs":[{"name":"stdout","output_type":"stream","text":["mkdir: cannot create directory ‘EveryDream/input’: File exists\n","mkdir: cannot create directory ‘EveryDream/output’: File exists\n"]}],"source":["# make folders for input and output\n","!mkdir input\n","!mkdir output\n","!mkdir .cache"]},{"cell_type":"markdown","metadata":{"id":"sbeUIVXJ-EVf"},"source":["# Upload your input images into the EveryDream/input folder\n","\n","![Beam vs Nucleus](demo/upload_images_caption.png)"]},{"cell_type":"markdown","metadata":{},"source":["## Please read the documentation here:\n","[Auto Captioning](doc/AUTO_CAPTION.md)\n","\n","You cannot have commented lines between options below. If you uncomment a line below, move it above any other commented lines.\n","\n","!python must remain the first line."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":18221,"status":"ok","timestamp":1667185808005,"user":{"displayName":"Victor Hall","userId":"00029068894644207946"},"user_tz":240},"id":"4TAICahl-RPn","outputId":"da7fa1a8-0855-403a-c295-4da31658d1f6"},"outputs":[],"source":["!python scripts/auto_caption.py \\\n","--img_dir EveryDream/input \\\n","--out_dir EveryDream/output \\\n","#--fomat mrwho \\ # for joepenna format\n","#--min_length 34 \\ # optional longer prompts\n","#--q_factor 1.3 \\ # optional tweak for longer prompts\n","#--nucleus \\ # alternative algorithm for short captions"]},{"cell_type":"markdown","metadata":{"id":"HBrWnu1C_lN9"},"source":["Download your captioned images from /content/EveryDream/output"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from google.colab import drive\n","drive.mount('/content/drive')\n","\n","!mkdir /content/drive/MyDrive/AutoCaption\n","!cp output/*.* /content/drive/MyDrive/AutoCaption"]}],"metadata":{"colab":{"authorship_tag":"ABX9TyN9ZSr0RyOQKdfeVsl2uOiE","collapsed_sections":[],"provenance":[{"file_id":"16QrivRfoDFvE7fAa7eLeVlxj78Q573E0","timestamp":1667185879409}]},"kernelspec":{"display_name":"Python 3.10.5 ('.venv': venv)","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.10.5"},"vscode":{"interpreter":{"hash":"faf4a6abb601e3a9195ce3e9620411ceec233a951446de834cdf28542d2d93b4"}}},"nbformat":4,"nbformat_minor":0} +{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Please read the documentation here before you start.\n","[Auto Captioning](doc/AUTO_CAPTION.md)\n","\n","This notebook requires a GPU instance, any will do, you don't need anything power. 4GB is fine.\n","\n","Only colab has automatic file transfers at this time. If you are using another platform, you will need to manually download your output files."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":929,"status":"ok","timestamp":1667184580032,"user":{"displayName":"Victor Hall","userId":"00029068894644207946"},"user_tz":240},"id":"lWGx2LuU8Q_I","outputId":"d0eb4d03-f16d-460b-981d-d5f88447e85e"},"outputs":[],"source":["#download repo\n","!git clone https://github.com/victorchall/EveryDream.git\n","%cd EveryDream"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":4944,"status":"ok","timestamp":1667184754992,"user":{"displayName":"Victor Hall","userId":"00029068894644207946"},"user_tz":240},"id":"RJxfSai-8pkD","outputId":"0ac1b805-62a0-48aa-e0da-ee19503bb3f1"},"outputs":[],"source":["# install requirements\n","!pip install torch=='1.12.1+cu113' 'torchvision==0.13.1+cu113' --extra-index-url https://download.pytorch.org/whl/cu113\n","!pip install pandas>='1.3.5'\n","!git clone https://github.com/salesforce/BLIP scripts/BLIP\n","!pip install timm\n","!pip install fairscale=='0.4.4'\n","!pip install transformers=='4.19.2'\n","!pip install timm"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":383,"status":"ok","timestamp":1667185773878,"user":{"displayName":"Victor Hall","userId":"00029068894644207946"},"user_tz":240},"id":"ruRaJ7Cx9vhw","outputId":"f0701d3e-bfa9-45a9-a742-c3615466aad7"},"outputs":[],"source":["# make folders for input and output\n","!mkdir input\n","!mkdir output\n","!mkdir .cache"]},{"cell_type":"markdown","metadata":{"id":"sbeUIVXJ-EVf"},"source":["# Upload your input images into the EveryDream/input folder\n","\n","![Beam vs Nucleus](demo/upload_images_caption.png)"]},{"cell_type":"markdown","metadata":{},"source":["## Please read the documentation here for information on the parameters\n","\n","[Auto Captioning](doc/AUTO_CAPTION.md)\n","\n","*You cannot have commented lines between uncommented lines. If you uncomment a line below, move it above any other commented lines.*\n","\n","*!python must remain the first line.*"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":18221,"status":"ok","timestamp":1667185808005,"user":{"displayName":"Victor Hall","userId":"00029068894644207946"},"user_tz":240},"id":"4TAICahl-RPn","outputId":"da7fa1a8-0855-403a-c295-4da31658d1f6"},"outputs":[],"source":["!python scripts/auto_caption.py \\\n","--img_dir EveryDream/input \\\n","--out_dir EveryDream/output \\\n","#--fomat mrwho \\\n","#--min_length 34 \\\n","#--q_factor 1.3 \\\n","#--nucleus \\"]},{"cell_type":"markdown","metadata":{"id":"HBrWnu1C_lN9"},"source":["## Download your captioned images from EveryDream/output\n","\n","If you're on a colab you can use the cell below."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from google.colab import drive\n","drive.mount('/content/drive')\n","\n","!mkdir /content/drive/MyDrive/AutoCaption\n","!cp output/*.* /content/drive/MyDrive/AutoCaption"]},{"cell_type":"markdown","metadata":{},"source":["## Can use the following to at least zip up your files for extration"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install patool\n","\n","import patoolib\n","\n","!mkdir output/zip\n","\n","!zip -r output/zip/output.zip output"]}],"metadata":{"colab":{"authorship_tag":"ABX9TyN9ZSr0RyOQKdfeVsl2uOiE","collapsed_sections":[],"provenance":[{"file_id":"16QrivRfoDFvE7fAa7eLeVlxj78Q573E0","timestamp":1667185879409}]},"kernelspec":{"display_name":"Python 3.10.5 ('.venv': venv)","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.10.5"},"vscode":{"interpreter":{"hash":"faf4a6abb601e3a9195ce3e9620411ceec233a951446de834cdf28542d2d93b4"}}},"nbformat":4,"nbformat_minor":0} diff --git a/README.MD b/README.MD index b076887..b04ac2a 100644 --- a/README.MD +++ b/README.MD @@ -10,16 +10,15 @@ For example, you can download a large scale model for Final Fantasy 7 Remake her Since DreamBooth is now fading away in favor of improved techniques, I will call the tecnique of using fully captioned training together with ground truth data "EveryDream" to avoid confusion. -If you are interested in caption training with stable diffusion and general purpose fine tuning, and have a 24GB Nvidia GPU, you can try my trainer fork: -https://github.com/victorchall/EveryDream-trainer (currently a bit beta but working) - Join the EveryDream discord here: https://discord.gg/uheqxU6sXN ## Tools -[Download scrapes using Laion](./doc/LAION_SCRAPE.md) - Web scrapes images off the web using Laion data files. +[Download scrapes using Laion](./doc/LAION_SCRAPE.md) - Web scrapes images off the web using Laion data files (runs on CPU). -[Auto Captioning](./doc/AUTO_CAPTION.md) - Uses BLIP interrogation to caption images for training. +[Auto Captioning](./doc/AUTO_CAPTION.md) - Uses BLIP interrogation to caption images for training (includes colab notebook, needs minimal GPU). + +[Training](https://github.com/victorchall/EveryDream-trainer) - Fine tuning with captioned training and ground truth data (needs 24GB GPU). ## Install diff --git a/doc/AUTO_CAPTION.md b/doc/AUTO_CAPTION.md index 0e1c1a5..af10eea 100644 --- a/doc/AUTO_CAPTION.md +++ b/doc/AUTO_CAPTION.md @@ -14,6 +14,14 @@ Place input files into the /input folder Files will be **copied** and renamed to the caption as the file name and placed into /output. +## Colab notebook + +This will run quite well on a T4 instance on Google Colab. Don't waste credits on more powerful GPUs. + +https://colab.research.google.com/github/victorchall/EveryDream/blob/main/AutoCaption.ipynb + +It should work on other GPU providers on minimal power Nvidia GPU instances, but you are on your own to upload and download files. + ## Additional command line args: ### --img_dir diff --git a/input/.gitkeep b/input/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/output/.gitkeep b/output/.gitkeep new file mode 100644 index 0000000..e69de29