From 158dea22da5fb191fc46c922099af857539339a4 Mon Sep 17 00:00:00 2001 From: Augusto de la Torre Date: Fri, 13 Jan 2023 00:28:11 +0100 Subject: [PATCH 1/5] Add Notebook for RunPod --- .gitignore | 3 +- Train_Runpod.ipynb | 367 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 369 insertions(+), 1 deletion(-) create mode 100644 Train_Runpod.ipynb diff --git a/.gitignore b/.gitignore index 3f3f5f8..9a569ef 100644 --- a/.gitignore +++ b/.gitignore @@ -6,4 +6,5 @@ /ckpt_cache/** **/.pytest_cache /logs/** -/tmp/** \ No newline at end of file +/tmp/** +.ipynb_checkpoints diff --git a/Train_Runpod.ipynb b/Train_Runpod.ipynb new file mode 100644 index 0000000..06c3f36 --- /dev/null +++ b/Train_Runpod.ipynb @@ -0,0 +1,367 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "2c831b5b-3025-4177-bef5-25aaec89573a", + "metadata": {}, + "source": [ + "## Every Dream v2 RunPod Setup\n", + "\n", + "[General Instructions](https://github.com/victorchall/EveryDream2trainer/blob/main/README.md)\n", + "\n", + "If you can sign up for Runpod here (shameless referral link): [Runpod](https://runpod.io/?ref=oko38cd0)\n", + "\n", + "If you are confused by the wall of text, join the discord here: [EveryDream Discord](https://discord.gg/uheqxU6sXN)\n", + "\n", + "### Requirements\n", + "Select the `RunPod Stable Diffusion v2.1` template. The `RunPod PyTorch` template does not work due to an old version of Python. \n", + "\n", + "#### Storage\n", + "Remember, on RunPod time is more expensive than storage. \n", + "\n", + "Which is good, because running a lot of experiments can generate a lot of data. Not having the right save points to recover quickly from inevitable mistakes will cost you a lot of time.\n", + "\n", + "When in doubt, give yourself ~125GB of Runpod **Volume** storage.\n", + "\n", + "#### Preparation\n", + "You will want to have your data prepared before starting, and have a rough training plan in mind. Don't waste rental fees if you're not fully prepared to start training. \n", + "\n", + "Your files should be captioned before you start with either the caption as the filename or in text files for each image alongside the image files. See [DATA.md](https://github.com/victorchall/EveryDream2trainer/blob/main/doc/DATA.md) for more details. Tools are available to automatically caption your files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0902e735", + "metadata": {}, + "outputs": [], + "source": [ + "# When running on a pod designed for Automatic 1111 \n", + "# we need to kill the webui process to free up mem for training\n", + "!ps x | grep -E \"(relauncher|webui)\" | grep -v \"grep\" | awk '{print $1}' | xargs kill $1\n", + "\n", + "# check system resources, make sure your GPU actually has 24GB\n", + "# You should see something like \"0 MB / 24576 MB\" in the middle of the printout\n", + "# if you see 0 MB / 22000 MB pick a beefier instance...\n", + "!grep Swap /proc/meminfo\n", + "!swapon -s\n", + "!nvidia-smi" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bb6d14b7-3c37-4ec4-8559-16b4e9b8dd18", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "%cd /workspace\n", + "if not os.path.exists(\"EveryDream2trainer\"):\n", + " !git clone https://github.com/victorchall/EveryDream2trainer\n", + "\n", + "%cd EveryDream2trainer\n", + "!python utils/get_yamls.py" + ] + }, + { + "cell_type": "markdown", + "id": "0bf1e8cd", + "metadata": {}, + "source": [ + "# Upload training files\n", + "\n", + "Ues the navigation on the left to open the ** \"workspace / EveryDream2trainer / input\"** and upload your training files using the **up arrow button** above the file explorer, or by dragging and dropping the files from your local machine onto the file explorer.\n", + "\n", + "If you have many training files, or nested folders of training data, create a zip archive of your training data, upload this file to the input folder, then right click on the zip file and select \"Extract Archive\".\n", + "\n", + "**While your training data is uploading, feel free to install the dependencies below**" + ] + }, + { + "cell_type": "markdown", + "id": "589bfca0", + "metadata": { + "tags": [] + }, + "source": [ + "## Install dependencies\n", + "\n", + "**This will take a couple minutes. Wait until it says \"DONE\" to move on.** \n", + "You can ignore \"warnings.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9649a02c-fb2b-44f1-842d-d1662fa5c7cd", + "metadata": {}, + "outputs": [], + "source": [ + "!python -m pip install --upgrade pip\n", + "\n", + "!pip install requests==2.25.1\n", + "!pip install -U -I torch==1.13.1+cu116 torchvision==0.14.1+cu116 --extra-index-url \"https://download.pytorch.org/whl/cu116\"\n", + "!pip install transformers==4.25.1\n", + "!pip install -U diffusers[torch]==0.10.2\n", + "\n", + "!pip install pynvml==11.4.1\n", + "!pip install bitsandbytes==0.35.0\n", + "!pip install ftfy==6.1.1\n", + "!pip install aiohttp==3.8.3\n", + "!pip install \"tensorboard>=2.11.0\"\n", + "!pip install protobuf==3.20.2\n", + "!pip install wandb==0.13.6\n", + "!pip install colorama==0.4.6\n", + "!pip install -U triton\n", + "\n", + "!pip install --pre -U xformers" + ] + }, + { + "cell_type": "markdown", + "id": "36105dbc-5a33-431b-b88e-b87d479d1ed7", + "metadata": {}, + "source": [ + "\n", + "# Optional - Configure sample prompts\n", + "You can set your own sample prompts by adding them, one line at a time, to sample_prompts.txt.\n", + "\n", + "Keep in mind a longer list of prompts will take longer to generate. You may also want to adjust you sample_steps in the training notebook to a different value to get samples left often. This is probably a good idea when training a larger dataset that you know will take longer to train, where more frequent samples will not help you." + ] + }, + { + "cell_type": "markdown", + "id": "c230d91a", + "metadata": {}, + "source": [ + "## Now that dependencies are installed, ready to move on!" + ] + }, + { + "cell_type": "markdown", + "id": "176af7b7-ebfe-4d25-a4a2-5c03489590ab", + "metadata": {}, + "source": [ + "## Log into huggingface\n", + "Run the cell below and paste your token into the prompt. You can get your token from your [huggingface account page](https://huggingface.co/settings/tokens).\n", + "\n", + "The token will not show on the screen, just press enter after you paste it.\n", + "\n", + "Then run the following cell to download the base checkpoint (may take a minute)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a1d04ff-8a2c-46c6-a5de-baea1b3e5a2b", + "metadata": {}, + "outputs": [], + "source": [ + "from huggingface_hub import notebook_login, hf_hub_download\n", + "import os\n", + "notebook_login()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "86b66fe4-c2ca-46fa-813c-8fe390813add", + "metadata": {}, + "outputs": [], + "source": [ + "repo=\"panopstor/EveryDream\"\n", + "ckpt_file=\"sd_v1-5_vae.ckpt\"\n", + "\n", + "print(f\"Downloading {ckpt_file} from {repo}\")\n", + "downloaded_model_path = hf_hub_download(repo, ckpt_file, cache_dir=\"/workspace/hfcache\")\n", + "ckpt_name = os.path.splitext(os.path.basename(downloaded_model_path))[0]\n", + "print(f\"Downloaded {ckpt_name} to {downloaded_model_path}\")\n", + "\n", + "if not os.path.exists(f\"ckpt_cache/{ckpt_name}\"):\n", + " print(f\"Converting {ckpt_name} to Diffusers format\")\n", + " !python utils/convert_original_stable_diffusion_to_diffusers.py --scheduler_type ddim \\\n", + " --original_config_file v1-inference.yaml \\\n", + " --image_size 512 \\\n", + " --checkpoint_path \"{downloaded_model_path}\" \\\n", + " --prediction_type epsilon \\\n", + " --upcast_attn False \\\n", + " --dump_path \"ckpt_cache/{ckpt_name}\"\n", + "\n", + "\n", + "print(\"DONE\")" + ] + }, + { + "cell_type": "markdown", + "id": "f15fcd56-0418-4be1-a5c3-38aa679b1aaf", + "metadata": {}, + "source": [ + "# Start Training\n", + "Naming your project will help you track what the heck you're doing when you're floating in checkpoint files later.\n", + "\n", + "You may wish to consider adding \"sd1\" or \"sd2v\" or similar to remember what the base was, as you'll also have to tell your inference app what you were using, as its difficult for programs to know what inference YAML to use automatically. For instance, Automatic1111 webui requires you to copy the v2 inference YAML and rename it to match your checkpoint name so it knows how to load the file, tough it assumes SD 1.x compatible. Something to keep in mind if you start training on SD2.1.\n", + "\n", + "`max_epochs`, `sample_steps`, and `save_every_n_epochs` should be tuned to your dataset. I like to generate one or two sets of samples per save, and aim for 5 (give or take 2) saved checkpoints.\n", + "\n", + "Next cell runs training. This will take a while depending on your number of images, repeats, and max_epochs.\n", + "\n", + "You can watch for test images in the logs folder." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f73fb86-ebef-41e2-9382-4aa11be84be6", + "metadata": {}, + "outputs": [], + "source": [ + "project_name = \"ft_v1\"\n", + "\n", + "!python train.py --project_name \"ft_v1_512_1e6\" \\\n", + "--resume_ckpt \"{ckpt_name}\" \\\n", + "--data_root \"input\" \\\n", + "--resolution 512 \\\n", + "--batch_size 4 \\\n", + "--max_epochs 30 \\\n", + "--save_every_n_epochs 5 \\\n", + "--lr 1e-6 \\\n", + "--lr_scheduler constant \\\n", + "--sample_steps 50 \\\n", + "--useadam8bit \\\n", + "--save_full_precision \\\n", + "--shuffle_tags \\\n", + "--notebook \\\n", + "--amp \\\n", + "--write_schedule" + ] + }, + { + "cell_type": "markdown", + "id": "ca170883-2d18-4f34-adbb-aca5824a351b", + "metadata": {}, + "source": [ + "## You can chain togther a tuning schedule using `--resume_ckpt findlast`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f39291ef-bea9-471e-bbc7-7b681fa87eb3", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "!python train.py --project_name \"{project_name}_512_7e7\" \\\n", + "--resume_ckpt findlast \\\n", + "--data_root \"input\" \\\n", + "--resolution 512 \\\n", + "--batch_size 4 \\\n", + "--max_epochs 10 \\\n", + "--save_every_n_epochs 3 \\\n", + "--lr 0.7e-6 \\\n", + "--lr_scheduler constant \\\n", + "--sample_steps 50 \\\n", + "--useadam8bit \\\n", + "--save_full_precision \\\n", + "--shuffle_tags \\\n", + "--notebook \\\n", + "--amp \\\n", + "--write_schedule\n" + ] + }, + { + "cell_type": "markdown", + "id": "f24eee3d-f5df-45f3-9acc-ee0206cfe6b1", + "metadata": {}, + "source": [ + "# HuggingFace upload\n", + "Use the cell below to upload your checkpoint to your personal HuggingFace account if you want instead of manually downloading. You should already be authorized to Huggingface by token if you used the download/token cells above.\n", + "\n", + "Make sure to fill in the three fields at the top. This will only upload one file at a time, so you will need to run the cell below for each file you want to upload.\n", + "\n", + "* You can get your account name from your [HuggingFace account page](https://huggingface.co/settings/account). Look for your \"username\" field and paste it below.\n", + "\n", + "* You only need to setup a repository one time. You can create it here: [Create New HF Model](https://huggingface.co/new) Make sure you write down the repo name you make for future use. You can reuse it later.\n", + "\n", + "* You need to type the name of the ckpts one at a time in the cell below, find them in the left file drawer of your Runpod/Vast/Colab." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d5e8bd2-3e9f-4196-ad7a-ae6dcfc84b92", + "metadata": {}, + "outputs": [], + "source": [ + "#list ckpts in root that are ready for download\n", + "import glob\n", + "\n", + "for f in glob.glob(\"*.ckpt\"):\n", + " print(f)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b9df5e1a-3c68-41c0-a4ed-ea0abcd19858", + "metadata": {}, + "outputs": [], + "source": [ + "!huggingface-cli lfs-enable-largefiles\n", + "# fill in these three fields:\n", + "hfusername = \"MyHfUser\"\n", + "reponame = \"MyRepo\"\n", + "ckpt_name = \"f_v1-ep10-gs02500.ckpt\"\n", + "\n", + "\n", + "target_name = ckpt_name.replace('-','').replace('=','')\n", + "import os\n", + "os.rename(ckpt_name,target_name)\n", + "repo_id=f\"{hfusername}/{reponame}\"\n", + "print(f\"uploading to HF: {repo_id}/{ckpt_name}\")\n", + "print(\"this make take a while...\")\n", + "\n", + "from huggingface_hub import HfApi\n", + "api = HfApi()\n", + "response = api.upload_file(\n", + " path_or_fileobj=target_name,\n", + " path_in_repo=target_name,\n", + " repo_id=repo_id,\n", + " repo_type=None,\n", + " create_pr=1,\n", + ")\n", + "print(response)\n", + "print(finish_msg)\n", + "print(\"go to your repo and accept the PR this created to see your file\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + }, + "vscode": { + "interpreter": { + "hash": "2e677f113ff5b533036843965d6e18980b635d0aedc1c5cebd058006c5afc92a" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 9bb213efafe6c392cc00f44d0e59e8c40c913e41 Mon Sep 17 00:00:00 2001 From: Augusto de la Torre Date: Tue, 24 Jan 2023 01:12:56 +0100 Subject: [PATCH 2/5] Simplify example training --- Train_Runpod.ipynb | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/Train_Runpod.ipynb b/Train_Runpod.ipynb index 06c3f36..70ac476 100644 --- a/Train_Runpod.ipynb +++ b/Train_Runpod.ipynb @@ -217,9 +217,7 @@ "metadata": {}, "outputs": [], "source": [ - "project_name = \"ft_v1\"\n", - "\n", - "!python train.py --project_name \"ft_v1_512_1e6\" \\\n", + "!python train.py --project_name \"ft_v1a_512_1e6\" \\\n", "--resume_ckpt \"{ckpt_name}\" \\\n", "--data_root \"input\" \\\n", "--resolution 512 \\\n", @@ -253,7 +251,7 @@ "outputs": [], "source": [ "\n", - "!python train.py --project_name \"{project_name}_512_7e7\" \\\n", + "!python train.py --project_name \"ft_v1b_512_7e7\" \\\n", "--resume_ckpt findlast \\\n", "--data_root \"input\" \\\n", "--resolution 512 \\\n", From 533353c104412d724bec9622628fe14e00b26f4c Mon Sep 17 00:00:00 2001 From: kayos Date: Tue, 24 Jan 2023 18:49:35 -0800 Subject: [PATCH 3/5] Update DATA.md Fix broken link --- doc/DATA.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/DATA.md b/doc/DATA.md index 85f9cc9..e1f4513 100644 --- a/doc/DATA.md +++ b/doc/DATA.md @@ -47,4 +47,4 @@ Also consider some basic mention of pose. ex. "clouds strife sitting on a blue ### Further reading -The [Data Balancing](doc/BALANCING.md) guide has some more information on how to balance your data and what to consider for model preservation and mixing in ground truth data. \ No newline at end of file +The [Data Balancing](BALANCING.md) guide has some more information on how to balance your data and what to consider for model preservation and mixing in ground truth data. From ff10929645b48d0f858ab509ba4f25aea0290f3b Mon Sep 17 00:00:00 2001 From: Augusto de la Torre Date: Thu, 26 Jan 2023 12:17:09 +0100 Subject: [PATCH 4/5] Remove empty line --- .gitignore | 1 - 1 file changed, 1 deletion(-) diff --git a/.gitignore b/.gitignore index 6197923..2dd5511 100644 --- a/.gitignore +++ b/.gitignore @@ -10,4 +10,3 @@ **/.ipynb_checkpoints /wandb/** /mycfgs/** -