EveryDream2trainer/CaptionCog.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cog Captioning\n",
    "<a href=\"https://colab.research.google.com/github/nawnie/EveryDream2trainer/blob/main/CaptionCog.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "This notebook is an implementation of [CogVLM](https://github.com/THUDM/CogVLM) for image captioning. \n",
    "\n",
    "Read [Docs](doc/CAPTION_COG.md) for basic usage guide. \n",
    "\n",
    "Open in [Google Colab](https://colab.research.google.com/github/victorchall/EveryDream2trainer/blob/main/CaptionCog.ipynb) **OR** use Runpod/Vast/Whatever using the EveryDream2trainer docker container/template and open this notebook.\n",
    "\n",
    "### Requirements\n",
    "Ampere or newer GPU with 16GB+ VRAM (ex. A100, 3090, 4060 Ti 16GB, etc). \n",
    "The 4bit quantization loading requires bfloat16 datatype support, which is not supported on older Turing GPUs like the T4 16GB, which rules out using free tier Google Colab.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install dependencies\n",
    "!pip install huggingface-hub -q\n",
    "!pip install transformers -q\n",
    "!pip install pynvml -q\n",
    "!pip install colorama -q\n",
    "!pip install peft -q\n",
    "!pip install bitsandbytes -q\n",
    "!pip install einops -q\n",
    "!pip install xformers -q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Colab only setup (do NOT run for docker/runpod/vast)\n",
    "!git clone https://github.com/victorchall/EveryDream2trainer\n",
    "%cd EveryDream2trainer\n",
    "%mkdir -p /content/EveryDream2trainer/input\n",
    "%cd /content/EveryDream2trainer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%cd /content/EveryDream2trainer\n",
    "#@markdown Optional:  Extract all TAR and ZIP files in the input folder (so you can just upload a large TAR/ZIP)\n",
    "import os\n",
    "import zipfile\n",
    "import tarfile\n",
    "\n",
    "# Directory containing the input files\n",
    "input_folder = \"input\"\n",
    "\n",
    "# Extract ZIP files\n",
    "for file in os.listdir(input_folder):\n",
    "    if file.endswith(\".zip\"):\n",
    "        file_path = os.path.join(input_folder, file)\n",
    "        with zipfile.ZipFile(file_path, 'r') as zip_ref:\n",
    "            zip_ref.extractall(input_folder)\n",
    "\n",
    "# Extract TAR files\n",
    "for file in os.listdir(input_folder):\n",
    "    if file.endswith(\".tar\"):\n",
    "        file_path = os.path.join(input_folder, file)\n",
    "        with tarfile.open(file_path, 'r') as tar_ref:\n",
    "            tar_ref.extractall(input_folder)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Connect Gdrive (Optional, will popup a warning)\n",
    "from google.colab import drive\n",
    "drive.mount('/content/drive')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run captions.\n",
    "\n",
    "Place your images in \"input\" folder, or you can change the image_dir to point to a Gdrive folder.\n",
    "Note: Colab may complain that you are running out of disk space, but it should still work.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 16GB GPU, must not use more than 1 beam\n",
    "# 24GB GPU, can use 3 beams\n",
    "%cd /content/EveryDream2trainer\n",
    "%run caption_cog.py --image_dir \"input\" --num_beams 1 --prompt \"Write a description.\" --no_overwrite"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is a fancier version of above with more options set\n",
    "%cd /content/EveryDream2trainer\n",
    "%run caption_cog.py --image_dir \"input\" --num_beams 1 \\\n",
    "    --prompt \"Write a description.\" \\\n",
    "    --starts_with \"An image of\" --remove_starts_with \\\n",
    "    --temp 0.9 --top_p 0.9 --top_k 40 \\\n",
    "    --bad_words \"depicts,showcases,appears,suggests\" \\\n",
    "    --no_overwrite "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "machine_shape": "hm",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 0
}