Merge branch 'harubaru:main' into patch-2

2022-09-10 19:40:10 -05:00 · 2022-09-10 19:40:10 -05:00 · a532563a71
parent 7f901bf252 bd6986fcc6
commit a532563a71
8 changed files with 198 additions and 192 deletions
--- a/README.md
+++ b/README.md
@ -9,7 +9,12 @@ Waifu Diffusion is the name for this project of finetuning Stable Diffusion on D
 <sub>Prompt: touhou 1girl komeiji_koishi portrait</sub>

 ## Documentation
-[Training Guide](https://github.com/harubaru/waifu-diffusion/blob/main/docs/en/training/README.md)
+
+[Index](./docs/en/README.md)
+
+[Weights](./docs/en/weights/README.md)
+
+[Training Guide](./docs/en/training/README.md)

 All thanks goes to CompVis and Stability AI for releasing this codebase!

@ -22,188 +27,6 @@ Model Link: https://huggingface.co/hakurei/waifu-diffusion
 # Stable Diffusion
 *Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:*

-[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)<br/>
-[Robin Rombach](https://github.com/rromb)\*,
-[Andreas Blattmann](https://github.com/ablattmann)\*,
-[Dominik Lorenz](https://github.com/qp-qp)\,
-[Patrick Esser](https://github.com/pesser),
-[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>
-
-**CVPR '22 Oral**
-
-which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF at [arXiv](https://arxiv.org/abs/2112.10752). Please also visit our [Project page](https://ommer-lab.com/research/latent-diffusion-models/).
-
-![txt2img-stable2](assets/stable-samples/txt2img/merged-0006.png)
-[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
-model.
-Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. 
-Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487), 
-this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
-With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
-See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
-
-  
-## Requirements
-A suitable [conda](https://conda.io/) environment named `ldm` can be created
-and activated with:
-
-```
-conda env create -f environment.yaml
-conda activate ldm
-```
-
-You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running
-
-```
-conda install pytorch torchvision -c pytorch
-pip install transformers==4.19.2
-pip install -e .
-``` 
-
-
-## Stable Diffusion v1
-
-Stable Diffusion v1 refers to a specific configuration of the model
-architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
-and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and 
-then finetuned on 512x512 images.
-
-*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
-in its training data. 
-Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).
-Research into the safe deployment of general text-to-image models is an ongoing effort. To prevent misuse and harm, we currently provide access to the checkpoints only for [academic research purposes upon request](https://stability.ai/academia-access-form).
-**This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations.***
-
-[Request access to Stable Diffusion v1 checkpoints for academic research](https://stability.ai/academia-access-form) 
-
-### Weights
-
-We currently provide three checkpoints, `sd-v1-1.ckpt`, `sd-v1-2.ckpt` and `sd-v1-3.ckpt`,
-which were trained as follows,
-
- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
-  194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
-  515k steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
-filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-improved-aesthetics" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
-
-Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
-5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
-steps show the relative improvements of the checkpoints:
-![sd evaluation results](assets/v1-variants-scores.jpg)
-
-
-
-### Text-to-Image with Stable Diffusion
-![txt2img-stable2](assets/stable-samples/txt2img/merged-0005.png)
-![txt2img-stable2](assets/stable-samples/txt2img/merged-0007.png)
-
-Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.
-
-
-#### Sampling Script
-
-After [obtaining the weights](#weights), link them
-```
-mkdir -p models/ldm/stable-diffusion-v1/
-ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 
-```
-and sample with
-```
-python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 
-```
-
-By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler, 
-and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type `python scripts/txt2img.py --help`).
-
-```commandline
-usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA] [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS]
-                  [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT] [--seed SEED] [--precision {full,autocast}]
-
-optional arguments:
-  -h, --help            show this help message and exit
-  --prompt [PROMPT]     the prompt to render
-  --outdir [OUTDIR]     dir to write results to
-  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
-  --skip_save           do not save individual samples. For speed measurements.
-  --ddim_steps DDIM_STEPS
-                        number of ddim sampling steps
-  --plms                use plms sampling
-  --laion400m           uses the LAION400M model
-  --fixed_code          if enabled, uses the same starting code across samples
-  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
-  --n_iter N_ITER       sample this often
-  --H H                 image height, in pixel space
-  --W W                 image width, in pixel space
-  --C C                 latent channels
-  --f F                 downsampling factor
-  --n_samples N_SAMPLES
-                        how many samples to produce for each given prompt. A.k.a. batch size
-  --n_rows N_ROWS       rows in the grid (default: n_samples)
-  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
-  --from-file FROM_FILE
-                        if specified, load prompts from this file
-  --config CONFIG       path to config which constructs model
-  --ckpt CKPT           path to checkpoint of model
-  --seed SEED           the seed (for reproducible sampling)
-  --precision {full,autocast}
-                        evaluate at this precision
-
-```
-Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints. 
-For this reason `use_ema=False` is set in the configuration, otherwise the code will try to switch from
-non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints
-which contain both types of weights. For these, `use_ema=False` will load and use the non-EMA weights.
-
-
-#### Diffusers Integration
-
-Another way to download and sample Stable Diffusion is by using the [diffusers library](https://github.com/huggingface/diffusers/tree/main#new--stable-diffusion-is-now-fully-compatible-with-diffusers)
-```py
-# make sure you're logged in with `huggingface-cli login`
-from torch import autocast
-from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
-
-pipe = StableDiffusionPipeline.from_pretrained(
-	"CompVis/stable-diffusion-v1-3-diffusers", 
-	use_auth_token=True
-)
-
-prompt = "a photo of an astronaut riding a horse on mars"
-with autocast("cuda"):
-    image = pipe(prompt)["sample"][0]  
-    
-image.save("astronaut_rides_horse.png")
-```
-
-
-
-### Image Modification with Stable Diffusion
-
-By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different 
-tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script, 
-we provide a script to perform image modification with Stable Diffusion.  
-
-The following describes an example where a rough sketch made in [Pinta](https://www.pinta-project.com/) is converted into a detailed artwork.
-```
-python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8
-```
-Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. 
-Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.
-
-**Input**
-
-![sketch-in](assets/stable-samples/img2img/sketch-mountains-input.jpg)
-
-**Outputs**
-
-![out3](assets/stable-samples/img2img/mountains-3.png)
-![out2](assets/stable-samples/img2img/mountains-2.png)
-
-This procedure can, for example, also be used to upscale samples from the base model.
-
-
 ## Comments 

 - Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
--- a/danbooru_data/download.py
+++ b/danbooru_data/download.py
@ -0,0 +1,80 @@
+import os
+import json
+import requests
+import multiprocessing
+import tqdm
+
+# downloads URLs from JSON
+
+import argparse
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--file', '-f', type=str, required=False)
+parser.add_argument('--out_dir', '-o', type=str, required=False)
+parser.add_argument('--threads', '-p', required=False, default=32)
+args = parser.parse_args()
+
+class DownloadManager():
+    def __init__(self, max_threads=32):
+        self.failed_downloads = []
+        self.max_threads = max_threads
+    
+    # args = (link, metadata, out_img_dir, out_text_dir)
+    def download(self, args):
+        try:
+            r = requests.get(args[0], stream=True)
+            with open(args[2] + args[0].split('/')[-1], 'wb') as f:
+                for chunk in r.iter_content(1024):
+                    f.write(chunk)
+            with open(args[3] + args[0].split('/')[-1].split('.')[0] + '.txt', 'w') as f:
+                f.write(args[1])
+        except:
+            self.failed_downloads.append((args[0], args[1]))
+    
+    def download_urls(self, file_path, out_dir):
+        with open(file_path) as f:
+            data = json.load(f)
+        
+        if not os.path.exists(out_dir):
+            os.makedirs(out_dir)
+            os.makedirs(out_dir + '/img')
+            os.makedirs(out_dir + '/text')
+        
+        thread_args = []
+
+        print(f'Loading {file_path} for download on {self.max_threads} threads...')
+
+        # create initial thread_args
+        for k, v in tqdm.tqdm(data.items()):
+            thread_args.append((k, v, out_dir + 'img/', out_dir + 'text/'))
+        
+        # divide thread_args into chunks divisible by max_threads
+        chunks = []
+        for i in range(0, len(thread_args), self.max_threads):
+            chunks.append(thread_args[i:i+self.max_threads])
+        
+        print(f'Downloading {len(thread_args)} images...')
+
+        # download chunks synchronously
+        for chunk in tqdm.tqdm(chunks):
+            with multiprocessing.Pool(self.max_threads) as p:
+                p.map(self.download, chunk)
+
+        if len(self.failed_downloads) > 0:
+            print("Failed downloads:")
+            for i in self.failed_downloads:
+                print(i[0])
+            print("\n")
+        """        
+        # attempt to download any remaining failed downloads
+        print('\nAttempting to download any failed downloads...')
+        print('Failed downloads:', len(self.failed_downloads))
+        if len(self.failed_downloads) > 0:
+            for url in tqdm.tqdm(self.failed_downloads):
+                self.download((url[0], url[1], out_dir + 'img/', out_dir + 'text/'))
+        """
+    
+        
+if __name__ == '__main__':
+    dm = DownloadManager(max_threads=args.threads)
+    dm.download_urls(args.file, args.out_dir)
--- a/danbooru_data/scrape.py
+++ b/danbooru_data/scrape.py
@ -0,0 +1,50 @@
+import threading
+import requests
+import json
+import random
+from pybooru import Danbooru
+from tqdm import tqdm
+
+import argparse
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--danbooru_username', '-user', type=str, required=False)
+parser.add_argument('--danbooru_key', '-key', type=str, required=False)
+parser.add_argument('--tags', '-t', required=False, default="solo -comic -animated -touhou -rating:general order:score age:<1month")
+parser.add_argument('--posts', '-p', required=False, default=10000)
+parser.add_argument('--output', '-o', required=False, default='links.json')
+args = parser.parse_args()
+
+class DanbooruScraper():
+    def __init__(self, username, key):
+        self.username = username
+        self.key = key
+        self.dbclient = Danbooru('danbooru', username=self.username, api_key=self.key)
+
+    # This will get danbooru urls and tags, put them in a dict, then write as a json file
+    def get_urls(self, tags, num_posts, batch_size, file="data_urls.json"):
+        dict = {}
+        if num_posts % batch_size != 0:
+            print("Error: num_posts must be divisible by batch_size")
+            return
+        for i in tqdm(range(num_posts//batch_size)):
+            urls = self.dbclient.post_list(tags=tags, limit=batch_size, random=False, page=i)
+            if not urls:
+                print(f'Empty results at {i}')
+                break
+            for j in urls:
+                if 'file_url' in j:
+                    if j['file_url'] not in dict:
+                        d_url = j['file_url']
+                        d_tags = j['tag_string_copyright'] + " " + j['tag_string_character'] + " " + j['tag_string_general'] + " " + j['tag_string_artist']
+
+                        dict[d_url] = d_tags
+                else:
+                    print("Error: file_url not found")
+        with open(file, 'w') as f:
+            json.dump(dict, f)
+
+# now test
+if __name__ == "__main__":
+    ds = DanbooruScraper(args.danbooru_username, args.danbooru_key)
+    ds.get_urls(args.tags, args.posts, 100, file=args.output)
--- a/docs/en/README.md
+++ b/docs/en/README.md
@ -2,4 +2,6 @@

 Waifu Diffusion is a project based off CompVis/Stable-Diffusion.

-For guidance on how to start training, see [training](https://github.com/harubaru/waifu-diffusion/tree/main/docs/en/training).
+For guidance on how to start training, see [training](./training/README.md).
+
+For a list of trained weights, see [weights](./weights/README.md).
--- a/docs/en/training/README.md
+++ b/docs/en/training/README.md
@ -1,8 +1,8 @@
 # Training documentation
 Training is available with waifu-diffusion. Before starting, we remind you that, at this moment at least 30GB of VRAM is needed, along with at least 30gb of storage if you don't mind cleaning up every so often.
 ## Contents
-1. [Dataset](https://github.com/harubaru/waifu-diffusion/blob/main/docs/en/training/dataset.md)
-2. [Configuration](https://github.com/harubaru/waifu-diffusion/blob/main/docs/en/training/configuration.md)
+1. [Dataset](./dataset.md)
+2. [Configuration](./configuration.md)
 3. Executing
 4. Recommendations
-5. FAQ
+5. FAQ
--- a/docs/en/training/dataset.md
+++ b/docs/en/training/dataset.md
@ -9,9 +9,13 @@ In this guide we are going to use the Danbooru2021 dataset by Gwern.net. You are
 4. Packaging the dataset

 ## Dataset requirements
+
 The dataset needs to be in the following format
+
 /dataset/ : Root dataset folder, can be any name
+
 /dataset/img/ : Folder for images
+
 /dataset/txt/ : Folder for text files

 It is recommended to have the images in 512x512 resolution and in JPG format. While the text files need to have the same name as the images it refers to.
@ -38,23 +42,35 @@ apt install rsync
 ````
 #### Windows
 On Windows, you are going to need to install Cygwin, a posix runtime for Windows which allows the usage of many linux-only programs inside windows.
+
 [Cygwin Installer for x86](https://www.cygwin.com/setup-x86_64.exe)
+
 On the installer, select mirrors.kernel.org for Download Site:
-![[cygwin-mirrors.png]]
+
+![cygwin-mirrors.png](./res/cygwin-mirrors.png)
+
 Next, search for "rsync" on the search bar, change "View: Pending" to "View: Full", and select on the "New" tab the latest version. Do the same for "zip".
-![[cygwin-packages.png]]
+
+![cygwin-packages.png](./res/cygwin-packages.png)
+
 GIF explaining the entire process:
-![[cygwin-gif.gif]]
+
+![cygwin-gif.gif](./res/cygwin-gif.gif)
+
 Once the installation is finished, you should see "Cygwin64 Terminal" on your Start Menu. Launch it and you should be greated by the following window:
-![[cygwin-idle.png]]
+
+![cygwin-idle.png](./res/cygwin-idle.png)
+
 You may now follow the intructions

 ### Downloading the dataset
 Remember that instructions here apply universally, both on Linux and Windows (If you are using Cygwin that is).

 The entire dataset weights about 5TB. You are not going to download everything, instead, you are only going to download two kinds of files:
+
 1. The images
 2. The JSON files (metadata)
+
 If you want to see the entire file list, you can refer to the [Danbooru2021 information site](https://www.gwern.net/Danbooru2021).

 We are going to extract the images from the 512px folder for convinience, since this folder already has the images resized to 512x512 resolution in JPG format. It only has safe rated images, for NSFW refer to [gwern.net](https://www.gwern.net/Danbooru2021#samples). 
@ -85,7 +101,8 @@ Change "/waifu-diffusion" to the path of the cloned waifu-diffusion repository.
 This script will also change some tags such as "1girl" to "one girl", "2boys" to "two boys", and so on. It will also add "upoaded on Danbooru".

 Once the script has finished, you should have a "labeled_data" folder, whose insides look like this:
-![[labeled_data-insides.png]]
+
+![labeled_data-insides.png](./res/labeled_data-insides.png)

 ## Packaging the dataset
 In order to reduce size, zip the contents of labeled_data:
--- a/docs/en/weights/README.md
+++ b/docs/en/weights/README.md
@ -0,0 +1,15 @@
+# Weights
+
+The following is a small list of available weights released by the Waifu Diffusion project:
+
+- Waifu Diffusion v1.2
+
+Release Date: 07/09/2022
+
+Steps/Epochs/Images: 5 Epochs, 56,000 Images
+
+Download: [Mirrors](./danbooru-7-09-2022/README.md)
+
+License: None
+
+Authors: Haru (haru#1367@discord)
--- a/docs/en/weights/danbooru-7-09-2022/README.md
+++ b/docs/en/weights/danbooru-7-09-2022/README.md
@ -0,0 +1,19 @@
+Waifu Diffusion v1.2
+
+Release Date: 07/09/2022
+
+Steps/Epochs/Images: 5 Epochs, 56,000 Images
+
+License: None
+
+Authors: Haru (haru#1367@discord)
+
+Mirrors:
+
+Google Drive (rate limit): https://drive.google.com/file/d/1XeoFCILTcc9kn_5uS-G0uqWS5XVANpha
+
+Magnet Link: magnet:?xt=urn:btih:INEYUMLLBBMZF22IIP4AEXLUK6XQKCSD&dn=wd-v1-2-full-ema.ckpt&xl=7703810927&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
+
+HTTPS mirror: https://thisanimedoesnotexist.ai/downloads/wd-v1-2-full-ema.ckpt (Fastest)
+
+HTTP mirror: http://wd.links.sd:8880/wd-v1-2-full-ema.ckpt