update cog doc with colab link

caption cog notebook2
caption cog notebook
2024-03-24 10:02:38 -04:00 · 2024-03-24 10:00:00 -04:00 · 2024-03-24 09:30:23 -04:00
2 changed files with 48 additions and 17 deletions
--- a/CaptionCog.ipynb
+++ b/CaptionCog.ipynb
@ -6,14 +6,16 @@
   "metadata": {},
   "source": [
    "# Cog Captioning\n",
+    "<a href=\"https://colab.research.google.com/github/nawnie/EveryDream2trainer/blob/main/CaptionCog.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "This notebook is an implementation of [CogVLM](https://github.com/THUDM/CogVLM) for image captioning. \n",
    "\n",
-    "This may require HIGH RAM shape on Google Colab, but T4 16gb is enough (even if slow).\n",
+    "Read [Docs](doc/CAPTION_COG.md) for basic usage guide. \n",
    "\n",
-    "1.  Read [Docs](doc/CAPTION_COG.md) for basic usage guide. \n",
-    "2.  Open in [Google Colab](https://colab.research.google.com/github/victorchall/EveryDream2trainer/blob/main/CaptionCog.ipynb) **OR** Runpod/Vast using the EveryDream2trainer docker container/template and open this notebook.\n",
-    "3.  Run the cells below to install dependencies.\n",
-    "4.  Place your images in \"input\" folder or change the data_root to point to a Gdrive folder."
+    "Open in [Google Colab](https://colab.research.google.com/github/victorchall/EveryDream2trainer/blob/main/CaptionCog.ipynb) **OR** use Runpod/Vast/Whatever using the EveryDream2trainer docker container/template and open this notebook.\n",
+    "\n",
+    "### Requirements\n",
+    "Ampere or newer GPU with 16GB+ VRAM (ex. A100, 3090, 4060 Ti 16GB, etc). \n",
+    "The 4bit quantization loading requires bfloat16 datatype support, which is not supported on older Turing GPUs like the T4 16GB, which rules out using free tier Google Colab.\n"
   ]
  },
  {
@ -23,10 +25,14 @@
   "outputs": [],
   "source": [
    "# install dependencies\n",
-    "!pip install huggingface-hub\n",
-    "!pip install transformers\n",
-    "!pip install pynvml\n",
-    "!pip install colorama"
+    "!pip install huggingface-hub -q\n",
+    "!pip install transformers -q\n",
+    "!pip install pynvml -q\n",
+    "!pip install colorama -q\n",
+    "!pip install peft -q\n",
+    "!pip install bitsandbytes -q\n",
+    "!pip install einops -q\n",
+    "!pip install xformers -q"
   ]
  },
  {
@ -38,7 +44,8 @@
    "# Colab only setup (do NOT run for docker/runpod/vast)\n",
    "!git clone https://github.com/victorchall/EveryDream2trainer\n",
    "%cd EveryDream2trainer\n",
-    "%mkdir -p /content/EveryDream2trainer/input"
+    "%mkdir -p /content/EveryDream2trainer/input\n",
+    "%cd /content/EveryDream2trainer"
   ]
  },
  {
@ -71,6 +78,17 @@
    "            tar_ref.extractall(input_folder)"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "## Connect Gdrive (Optional, will popup a warning)\n",
+    "from google.colab import drive\n",
+    "drive.mount('/content/drive')"
+   ]
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
@ -78,9 +96,8 @@
   "source": [
    "## Run captions.\n",
    "\n",
-    "Place your images in \"input\" folder, or you can change the data_root to point to a Gdrive folder.\n",
-    "\n",
-    "Run either the 24GB or 16GB model or adjust settings on your own."
+    "Place your images in \"input\" folder, or you can change the image_dir to point to a Gdrive folder.\n",
+    "Note: Colab may complain that you are running out of disk space, but it should still work.\n"
   ]
  },
  {
@ -92,7 +109,7 @@
    "# 16GB GPU, must not use more than 1 beam\n",
    "# 24GB GPU, can use 3 beams\n",
    "%cd /content/EveryDream2trainer\n",
-    "%run caption_cog.py --image_dir \"input\" --num_beams 1 --prompt \"Write a description.\""
+    "%run caption_cog.py --image_dir \"input\" --num_beams 1 --prompt \"Write a description.\" --no_overwrite"
   ]
  },
  {
@ -103,8 +120,20 @@
   "source": [
    "# This is a fancier version of above with more options set\n",
    "%cd /content/EveryDream2trainer\n",
-    "%run caption_cog.py --image_dir \"input\" --num_beams 1 --prompt \"Write a description.\" --starts_with \"An image of\" --remove_starts_with --temp 0.9 --top_p 0.9 --top_k 40 --bad_words \"depicts,showcases,appears,suggests\""
+    "%run caption_cog.py --image_dir \"input\" --num_beams 1 \\\n",
+    "    --prompt \"Write a description.\" \\\n",
+    "    --starts_with \"An image of\" --remove_starts_with \\\n",
+    "    --temp 0.9 --top_p 0.9 --top_k 40 \\\n",
+    "    --bad_words \"depicts,showcases,appears,suggests\" \\\n",
+    "    --no_overwrite "
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
--- a/doc/CAPTION_COG.md
+++ b/doc/CAPTION_COG.md
@ -1,11 +1,13 @@
 # CogVLM captioning

-CogVLM [code](https://github.com/THUDM/CogVLM) [model](https://huggingface.co/THUDM/cogvlm-chat-hf) is, so far (Q1 2024), the best model for automatically generating captions. 
+CogVLM ([code](https://github.com/THUDM/CogVLM)) ([model](https://huggingface.co/THUDM/cogvlm-chat-hf)) is, so far (Q1 2024), the best model for automatically generating captions. 

-The model uses about 13.5GB of VRAM due to 4bit inference with the default setting of 1 beam, and up to 4 or 5 beams is possible with a 24GB GPU meaning it is very capable on consumer hardware.  It is slow, ~6-10 seconds on a RTX 3090, but the quality is worth it over other models. 
+The model uses about 13.5GB of VRAM due to 4bit inference with the default setting of 1 beam, and up to 4 or 5 beams is possible with a 24GB GPU meaning it is very capable on consumer hardware.  It is slow, ~6-10+ seconds on a RTX 3090, but the quality is worth it over other models. 

 It is capable of naming and identifying things with proper nouns and has a large vocabulary. It can also readily read text even for hard to read fonts, from oblique angles, or from curved surfaces.

+<a href="https://colab.research.google.com/github/nawnie/EveryDream2trainer/blob/main/CaptionCog.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
+
 ## Basics

 Run `python caption_cog.py --help` to get a list of options.
Author	SHA1	Message	Date
Victor Hall	45ecb11402	update cog doc with colab link	2024-03-24 10:02:38 -04:00
Victor Hall	87bfe652ce	caption cog notebook2	2024-03-24 10:00:00 -04:00
Victor Hall	93659f3eb4	caption cog notebook	2024-03-24 09:30:23 -04:00