158 lines
4.8 KiB
Plaintext
158 lines
4.8 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Cog Captioning\n",
|
|
"<a href=\"https://colab.research.google.com/github/nawnie/EveryDream2trainer/blob/main/CaptionCog.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
|
|
"This notebook is an implementation of [CogVLM](https://github.com/THUDM/CogVLM) for image captioning. \n",
|
|
"\n",
|
|
"Read [Docs](doc/CAPTION_COG.md) for basic usage guide. \n",
|
|
"\n",
|
|
"Open in [Google Colab](https://colab.research.google.com/github/victorchall/EveryDream2trainer/blob/main/CaptionCog.ipynb) **OR** use Runpod/Vast/Whatever using the EveryDream2trainer docker container/template and open this notebook.\n",
|
|
"\n",
|
|
"### Requirements\n",
|
|
"Ampere or newer GPU with 16GB+ VRAM (ex. A100, 3090, 4060 Ti 16GB, etc). \n",
|
|
"The 4bit quantization loading requires bfloat16 datatype support, which is not supported on older Turing GPUs like the T4 16GB, which rules out using free tier Google Colab.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# install dependencies\n",
|
|
"!pip install huggingface-hub -q\n",
|
|
"!pip install transformers -q\n",
|
|
"!pip install pynvml -q\n",
|
|
"!pip install colorama -q\n",
|
|
"!pip install peft -q\n",
|
|
"!pip install bitsandbytes -q\n",
|
|
"!pip install einops -q\n",
|
|
"!pip install xformers -q"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Colab only setup (do NOT run for docker/runpod/vast)\n",
|
|
"!git clone https://github.com/victorchall/EveryDream2trainer\n",
|
|
"%cd EveryDream2trainer\n",
|
|
"%mkdir -p /content/EveryDream2trainer/input\n",
|
|
"%cd /content/EveryDream2trainer"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%cd /content/EveryDream2trainer\n",
|
|
"#@markdown Optional: Extract all TAR and ZIP files in the input folder (so you can just upload a large TAR/ZIP)\n",
|
|
"import os\n",
|
|
"import zipfile\n",
|
|
"import tarfile\n",
|
|
"\n",
|
|
"# Directory containing the input files\n",
|
|
"input_folder = \"input\"\n",
|
|
"\n",
|
|
"# Extract ZIP files\n",
|
|
"for file in os.listdir(input_folder):\n",
|
|
" if file.endswith(\".zip\"):\n",
|
|
" file_path = os.path.join(input_folder, file)\n",
|
|
" with zipfile.ZipFile(file_path, 'r') as zip_ref:\n",
|
|
" zip_ref.extractall(input_folder)\n",
|
|
"\n",
|
|
"# Extract TAR files\n",
|
|
"for file in os.listdir(input_folder):\n",
|
|
" if file.endswith(\".tar\"):\n",
|
|
" file_path = os.path.join(input_folder, file)\n",
|
|
" with tarfile.open(file_path, 'r') as tar_ref:\n",
|
|
" tar_ref.extractall(input_folder)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"## Connect Gdrive (Optional, will popup a warning)\n",
|
|
"from google.colab import drive\n",
|
|
"drive.mount('/content/drive')"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Run captions.\n",
|
|
"\n",
|
|
"Place your images in \"input\" folder, or you can change the image_dir to point to a Gdrive folder.\n",
|
|
"Note: Colab may complain that you are running out of disk space, but it should still work.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# 16GB GPU, must not use more than 1 beam\n",
|
|
"# 24GB GPU, can use 3 beams\n",
|
|
"%cd /content/EveryDream2trainer\n",
|
|
"%run caption_cog.py --image_dir \"input\" --num_beams 1 --prompt \"Write a description.\" --no_overwrite"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# This is a fancier version of above with more options set\n",
|
|
"%cd /content/EveryDream2trainer\n",
|
|
"%run caption_cog.py --image_dir \"input\" --num_beams 1 \\\n",
|
|
" --prompt \"Write a description.\" \\\n",
|
|
" --starts_with \"An image of\" --remove_starts_with \\\n",
|
|
" --temp 0.9 --top_p 0.9 --top_k 40 \\\n",
|
|
" --bad_words \"depicts,showcases,appears,suggests\" \\\n",
|
|
" --no_overwrite "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"accelerator": "GPU",
|
|
"colab": {
|
|
"gpuType": "T4",
|
|
"machine_shape": "hm",
|
|
"provenance": []
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"name": "python"
|
|
},
|
|
"orig_nbformat": 4
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0
|
|
}
|