"1. Prepare your training data before you begin (see below)\n",
"2. Spin the `RunPod Stable Diffusion v2.1` template. The `RunPod PyTorch` template does not work due to an old version of Python. \n",
"3. Open this notebook with `File > Open from URL...` pointing to `https://raw.githubusercontent.com/victorchall/EveryDream2trainer/main/Train_Runpod.ipynb`\n",
"4. Run each cell below once, noting any instructions above the cell (the first step requires a pod restart)\n",
"5. Figure out how you want to tweak the process next\n",
"Remember, on RunPod time is more expensive than storage. \n",
"\n",
"Which is good, because running a lot of experiments can generate a lot of data. Not having the right save points to recover quickly from inevitable mistakes will cost you a lot of time.\n",
"\n",
"When in doubt, give yourself ~125GB of Runpod **Volume** storage.\n",
"You will want to have your data prepared before starting, and have a rough training plan in mind. Don't waste rental fees if you're not fully prepared to start training. \n",
"**If this is your first time trying a full fine-tune, start small!** \n",
"Pick a single concept and 30-100 images, and see what happens. Training a small dataset like this is fast, and will give you a feel for how quickly your model (over-)trains depending on your training schedule.\n",
"Your files should be captioned before you start with either the caption as the filename or in text files for each image alongside the image files. See [DATA.md](https://github.com/victorchall/EveryDream2trainer/blob/main/doc/DATA.md) for more details. Tools are available to automatically caption your files."
"Here we ensure that EveryDream2trainer is installed, and we disable the Automatic 1111 web-ui. But the vram consumed by the web-ui will not be fully freed until the pod restarts. This is especially important if you are training with large batch sizes."
"Ues the navigation on the left to open the ** \"workspace / EveryDream2trainer / input\"** and upload your training files using the **up arrow button** above the file explorer, or by dragging and dropping the files from your local machine onto the file explorer.\n",
"\n",
"If you have many training files, or nested folders of training data, create a zip archive of your training data, upload this file to the input folder, then right click on the zip file and select \"Extract Archive\".\n",
"You can set your own sample prompts by adding them, one line at a time, to sample_prompts.txt.\n",
"\n",
"Keep in mind a longer list of prompts will take longer to generate. You may also want to adjust you sample_steps in the training notebook to a different value to get samples left often. This is probably a good idea when training a larger dataset that you know will take longer to train, where more frequent samples will not help you.\n",
"\n",
"While your training data is uploading, go ahead to install the dependencies below\n",
"## Now that dependencies are installed, ready to move on!"
]
},
{
"cell_type": "markdown",
"id": "176af7b7-ebfe-4d25-a4a2-5c03489590ab",
"metadata": {},
"source": [
"## Log into huggingface\n",
"Run the cell below and paste your token into the prompt. You can get your token from your [huggingface account page](https://huggingface.co/settings/tokens).\n",
"\n",
"The token will not show on the screen, just press enter after you paste it.\n",
"\n",
"Then run the following cell to download the base checkpoint (may take a minute)."
"Naming your project will help you track what the heck you're doing when you're floating in checkpoint files later.\n",
"\n",
"You may wish to consider adding \"sd1\" or \"sd2v\" or similar to remember what the base was, as you'll also have to tell your inference app what you were using, as its difficult for programs to know what inference YAML to use automatically. For instance, Automatic1111 webui requires you to copy the v2 inference YAML and rename it to match your checkpoint name so it knows how to load the file, tough it assumes SD 1.x compatible. Something to keep in mind if you start training on SD2.1.\n",
"\n",
"`max_epochs`, `sample_steps`, and `save_every_n_epochs` should be tuned to your dataset. I like to generate one or two sets of samples per save, and aim for 5 (give or take 2) saved checkpoints.\n",
"\n",
"Next cell runs training. This will take a while depending on your number of images, repeats, and max_epochs.\n",
"\n",
"You can watch for test images in the logs folder."
"Use the cell below to upload one or more checkpoints to your personal HuggingFace account, if you want, instead of manually downloading. You should already be authorized to Huggingface by token if you used the download/token cells above.\n",
"* You can get your account name from your [HuggingFace account page](https://huggingface.co/settings/account). Look for your \"username\" field and paste it below.\n",
"* You only need to setup a repository one time. You can create it here: [Create New HF Model](https://huggingface.co/new) Make sure you write down the repo name you make for future use. You can reuse it later."