# Open-flamingo Captioning
This notebook is an implementation of [OpenFlamingo](https://github.com/mlfoundations/open_flamingo) for image captioning. 

This will require HIGH RAM shape on Google Colab, but T4 16gb is enough to run the 3B model.  9B model requires 24GB GPU or better.

1.  Read [Docs](doc/CAPTION.md) for basic usage guide. 
2.  Open in [Google Colab](https://colab.research.google.com/github/victorchall/EveryDream2trainer/blob/main/CaptionFL.ipynb) **OR** Runpod/Vast using the EveryDream2trainer docker container/template and open this notebook.
3.  Run the cells below to install dependencies.
4.  Place your images in "input" folder or change the data_root to point to a Gdrive folder.

In [None]:
# install dependencies
!pip install open-flamingo==2.0.0
!pip install huggingface-hub==0.15.1
!pip install transformers==4.30.2
!pip install pynvml
!pip install colorama

In [None]:
# Colab only setup (do NOT run for docker/runpod/vast)
!git clone https://github.com/victorchall/EveryDream2trainer
%cd EveryDream2trainer
%mkdir -p /content/EveryDream2trainer/input

In [None]:
%cd /content/EveryDream2trainer
#@markdown Optional:  Extract all TAR and ZIP files in the input folder (so you can just upload a large TAR/ZIP)
import os
import zipfile
import tarfile

# Directory containing the input files
input_folder = "input"

# Extract ZIP files
for file in os.listdir(input_folder):
    if file.endswith(".zip"):
        file_path = os.path.join(input_folder, file)
        with zipfile.ZipFile(file_path, 'r') as zip_ref:
            zip_ref.extractall(input_folder)

# Extract TAR files
for file in os.listdir(input_folder):
    if file.endswith(".tar"):
        file_path = os.path.join(input_folder, file)
        with tarfile.open(file_path, 'r') as tar_ref:
            tar_ref.extractall(input_folder)

## Run captions.

Place your images in "input" folder, or you can change the data_root to point to a Gdrive folder.

Run either the 24GB or 16GB model or adjust settings on your own.

In [None]:
# 24GB GPU, 9b model
%cd /content/EveryDream2trainer
%run caption_fl.py --data_root "input" --min_new_tokens 20 --max_new_tokens 30 --num_beams 3 --model "openflamingo/OpenFlamingo-9B-vitl-mpt7b"

In [None]:
# 16GB GPU, 3b model
%cd /content/EveryDream2trainer
%run caption_fl.py --data_root "input" --min_new_tokens 20 --max_new_tokens 30 --num_beams 8 --model "openflamingo/OpenFlamingo-3B-vitl-mpt1b"