# Cog Captioning
<a href="https://colab.research.google.com/github/nawnie/EveryDream2trainer/blob/main/CaptionCog.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
This notebook is an implementation of [CogVLM](https://github.com/THUDM/CogVLM) for image captioning. 

Read [Docs](doc/CAPTION_COG.md) for basic usage guide. 

Open in [Google Colab](https://colab.research.google.com/github/victorchall/EveryDream2trainer/blob/main/CaptionCog.ipynb) **OR** use Runpod/Vast/Whatever using the EveryDream2trainer docker container/template and open this notebook.

### Requirements
Ampere or newer GPU with 16GB+ VRAM (ex. A100, 3090, 4060 Ti 16GB, etc). 
The 4bit quantization loading requires bfloat16 datatype support, which is not supported on older Turing GPUs like the T4 16GB, which rules out using free tier Google Colab.


In [None]:
# install dependencies
!pip install huggingface-hub -q
!pip install transformers -q
!pip install pynvml -q
!pip install colorama -q
!pip install peft -q
!pip install bitsandbytes -q
!pip install einops -q
!pip install xformers -q

In [None]:
# Colab only setup (do NOT run for docker/runpod/vast)
!git clone https://github.com/victorchall/EveryDream2trainer
%cd EveryDream2trainer
%mkdir -p /content/EveryDream2trainer/input
%cd /content/EveryDream2trainer

In [None]:
%cd /content/EveryDream2trainer
#@markdown Optional:  Extract all TAR and ZIP files in the input folder (so you can just upload a large TAR/ZIP)
import os
import zipfile
import tarfile

# Directory containing the input files
input_folder = "input"

# Extract ZIP files
for file in os.listdir(input_folder):
    if file.endswith(".zip"):
        file_path = os.path.join(input_folder, file)
        with zipfile.ZipFile(file_path, 'r') as zip_ref:
            zip_ref.extractall(input_folder)

# Extract TAR files
for file in os.listdir(input_folder):
    if file.endswith(".tar"):
        file_path = os.path.join(input_folder, file)
        with tarfile.open(file_path, 'r') as tar_ref:
            tar_ref.extractall(input_folder)

In [None]:
## Connect Gdrive (Optional, will popup a warning)
from google.colab import drive
drive.mount('/content/drive')

## Run captions.

Place your images in "input" folder, or you can change the image_dir to point to a Gdrive folder.
Note: Colab may complain that you are running out of disk space, but it should still work.


In [None]:
# 16GB GPU, must not use more than 1 beam
# 24GB GPU, can use 3 beams
%cd /content/EveryDream2trainer
%run caption_cog.py --image_dir "input" --num_beams 1 --prompt "Write a description." --no_overwrite

In [None]:
# This is a fancier version of above with more options set
%cd /content/EveryDream2trainer
%run caption_cog.py --image_dir "input" --num_beams 1 \
    --prompt "Write a description." \
    --starts_with "An image of" --remove_starts_with \
    --temp 0.9 --top_p 0.9 --top_k 40 \
    --bad_words "depicts,showcases,appears,suggests" \
    --no_overwrite 