EveryDream2trainer/CaptionCog.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cog Captioning\n",
    "This notebook is an implementation of [CogVLM](https://github.com/THUDM/CogVLM) for image captioning. \n",
    "\n",
    "This may require HIGH RAM shape on Google Colab, but T4 16gb is enough (even if slow).\n",
    "\n",
    "1.  Read [Docs](doc/CAPTION_COG.md) for basic usage guide. \n",
    "2.  Open in [Google Colab](https://colab.research.google.com/github/victorchall/EveryDream2trainer/blob/main/CaptionCog.ipynb) **OR** Runpod/Vast using the EveryDream2trainer docker container/template and open this notebook.\n",
    "3.  Run the cells below to install dependencies.\n",
    "4.  Place your images in \"input\" folder or change the data_root to point to a Gdrive folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install dependencies\n",
    "!pip install huggingface-hub\n",
    "!pip install transformers\n",
    "!pip install pynvml\n",
    "!pip install colorama"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Colab only setup (do NOT run for docker/runpod/vast)\n",
    "!git clone https://github.com/victorchall/EveryDream2trainer\n",
    "%cd EveryDream2trainer\n",
    "%mkdir -p /content/EveryDream2trainer/input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%cd /content/EveryDream2trainer\n",
    "#@markdown Optional:  Extract all TAR and ZIP files in the input folder (so you can just upload a large TAR/ZIP)\n",
    "import os\n",
    "import zipfile\n",
    "import tarfile\n",
    "\n",
    "# Directory containing the input files\n",
    "input_folder = \"input\"\n",
    "\n",
    "# Extract ZIP files\n",
    "for file in os.listdir(input_folder):\n",
    "    if file.endswith(\".zip\"):\n",
    "        file_path = os.path.join(input_folder, file)\n",
    "        with zipfile.ZipFile(file_path, 'r') as zip_ref:\n",
    "            zip_ref.extractall(input_folder)\n",
    "\n",
    "# Extract TAR files\n",
    "for file in os.listdir(input_folder):\n",
    "    if file.endswith(\".tar\"):\n",
    "        file_path = os.path.join(input_folder, file)\n",
    "        with tarfile.open(file_path, 'r') as tar_ref:\n",
    "            tar_ref.extractall(input_folder)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run captions.\n",
    "\n",
    "Place your images in \"input\" folder, or you can change the data_root to point to a Gdrive folder.\n",
    "\n",
    "Run either the 24GB or 16GB model or adjust settings on your own."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 16GB GPU, must not use more than 1 beam\n",
    "# 24GB GPU, can use 3 beams\n",
    "%cd /content/EveryDream2trainer\n",
    "%run caption_cog.py --image_dir \"input\" --num_beams 1 --prompt \"Write a description.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is a fancier version of above with more options set\n",
    "%cd /content/EveryDream2trainer\n",
    "%run caption_cog.py --image_dir \"input\" --num_beams 1 --prompt \"Write a description.\" --starts_with \"An image of\" --remove_starts_with --temp 0.9 --top_p 0.9 --top_k 40 --bad_words \"depicts,showcases,appears,suggests\""
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "machine_shape": "hm",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 0
}