151 lines
5.8 KiB
Markdown
151 lines
5.8 KiB
Markdown
# Diffusers
|
|
|
|
## Definitions
|
|
|
|
**Models**: Single neural network that models p_θ(x_t-1|x_t) and is trained to “denoise” to image
|
|
*Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet*
|
|
|
|
![model_diff_1_50](https://user-images.githubusercontent.com/23423619/171610307-dab0cd8b-75da-4d4e-9f5a-5922072e2bb5.png)
|
|
|
|
**Samplers**: Algorithm to *train* and *sample* from **Model**. Defines alpha and beta schedule, timesteps, etc..
|
|
*Example: Vanilla DDPM, DDIM, PMLS, DEIN*
|
|
|
|
![sampling](https://user-images.githubusercontent.com/23423619/171608981-3ad05953-a684-4c82-89f8-62a459147a07.png)
|
|
![training](https://user-images.githubusercontent.com/23423619/171608964-b3260cce-e6b4-4841-959d-7d8ba4b8d1b2.png)
|
|
|
|
**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, CLIP
|
|
*Example: GLIDE,CompVis/Latent-Diffusion, Imagen, DALL-E*
|
|
|
|
![imagen](https://user-images.githubusercontent.com/23423619/171609001-c3f2c1c9-f597-4a16-9843-749bf3f9431c.png)
|
|
|
|
## 1. `diffusers` as a central modular diffusion and sampler library
|
|
|
|
`diffusers` should be more modularized than `transformers` so that parts of it can be easily used in other libraries.
|
|
It could become a central place for all kinds of models, schedulers, training utils and processors required when using diffusion models in audio, vision, ...
|
|
One should be able to save both models and samplers as well as load them from the Hub.
|
|
|
|
Example:
|
|
|
|
```python
|
|
import torch
|
|
from diffusers import UNetModel, GaussianDDPMScheduler
|
|
import PIL
|
|
import numpy as np
|
|
|
|
generator = torch.Generator()
|
|
generator = generator.manual_seed(6694729458485568)
|
|
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
|
# 1. Load models
|
|
scheduler = GaussianDDPMScheduler.from_config("fusing/ddpm-lsun-church")
|
|
model = UNetModel.from_pretrained("fusing/ddpm-lsun-church").to(torch_device)
|
|
|
|
# 2. Sample gaussian noise
|
|
image = scheduler.sample_noise((1, model.in_channels, model.resolution, model.resolution), device=torch_device, generator=generator)
|
|
|
|
# 3. Denoise
|
|
for t in reversed(range(len(scheduler))):
|
|
# i) define coefficients for time step t
|
|
clip_image_coeff = 1 / torch.sqrt(scheduler.get_alpha_prod(t))
|
|
clip_noise_coeff = torch.sqrt(1 / scheduler.get_alpha_prod(t) - 1)
|
|
image_coeff = (1 - scheduler.get_alpha_prod(t - 1)) * torch.sqrt(scheduler.get_alpha(t)) / (1 - scheduler.get_alpha_prod(t))
|
|
clip_coeff = torch.sqrt(scheduler.get_alpha_prod(t - 1)) * scheduler.get_beta(t) / (1 - scheduler.get_alpha_prod(t))
|
|
|
|
# ii) predict noise residual
|
|
with torch.no_grad():
|
|
noise_residual = model(image, t)
|
|
|
|
# iii) compute predicted image from residual
|
|
# See 2nd formula at https://github.com/hojonathanho/diffusion/issues/5#issue-896554416 for comparison
|
|
pred_mean = clip_image_coeff * image - clip_noise_coeff * noise_residual
|
|
pred_mean = torch.clamp(pred_mean, -1, 1)
|
|
prev_image = clip_coeff * pred_mean + image_coeff * image
|
|
|
|
# iv) sample variance
|
|
prev_variance = scheduler.sample_variance(t, prev_image.shape, device=torch_device, generator=generator)
|
|
|
|
# v) sample x_{t-1} ~ N(prev_image, prev_variance)
|
|
sampled_prev_image = prev_image + prev_variance
|
|
image = sampled_prev_image
|
|
|
|
# process image to PIL
|
|
image_processed = image.cpu().permute(0, 2, 3, 1)
|
|
image_processed = (image_processed + 1.0) * 127.5
|
|
image_processed = image_processed.numpy().astype(np.uint8)
|
|
image_pil = PIL.Image.fromarray(image_processed[0])
|
|
|
|
# save image
|
|
image_pil.save("test.png")
|
|
```
|
|
|
|
## 2. `diffusers` as a collection of most import Diffusion models (GLIDE, Dalle, ...)
|
|
`models` directory in repository hosts complete diffusion training code & pipelines. Easily load & saveable from the Hub. Will be possible to use just from pip `diffusers` version:
|
|
|
|
Example:
|
|
|
|
```python
|
|
from modeling_ddpm import DDPM
|
|
import PIL.Image
|
|
import numpy as np
|
|
|
|
# load model and scheduler
|
|
ddpm = DDPM.from_pretrained("fusing/ddpm-lsun-bedroom-pipe")
|
|
|
|
# run pipeline in inference (sample random noise and denoise)
|
|
image = ddpm()
|
|
|
|
# process image to PIL
|
|
image_processed = image.cpu().permute(0, 2, 3, 1)
|
|
image_processed = (image_processed + 1.0) * 127.5
|
|
image_processed = image_processed.numpy().astype(np.uint8)
|
|
image_pil = PIL.Image.fromarray(image_processed[0])
|
|
|
|
# save image
|
|
image_pil.save("test.png")
|
|
```
|
|
|
|
## Library structure:
|
|
|
|
```
|
|
├── models
|
|
│ ├── audio
|
|
│ │ └── fastdiff
|
|
│ │ ├── modeling_fastdiff.py
|
|
│ │ ├── README.md
|
|
│ │ └── run_fastdiff.py
|
|
│ └── vision
|
|
│ ├── dalle2
|
|
│ │ ├── modeling_dalle2.py
|
|
│ │ ├── README.md
|
|
│ │ └── run_dalle2.py
|
|
│ ├── ddpm
|
|
│ │ ├── modeling_ddpm.py
|
|
│ │ ├── README.md
|
|
│ │ └── run_ddpm.py
|
|
│ ├── glide
|
|
│ │ ├── modeling_glide.py
|
|
│ │ ├── README.md
|
|
│ │ └── run_dalle2.py
|
|
│ ├── imagen
|
|
│ │ ├── modeling_dalle2.py
|
|
│ │ ├── README.md
|
|
│ │ └── run_dalle2.py
|
|
│ └── latent_diffusion
|
|
│ ├── modeling_latent_diffusion.py
|
|
│ ├── README.md
|
|
│ └── run_latent_diffusion.py
|
|
|
|
├── src
|
|
│ └── diffusers
|
|
│ ├── configuration_utils.py
|
|
│ ├── __init__.py
|
|
│ ├── modeling_utils.py
|
|
│ ├── models
|
|
│ │ └── unet.py
|
|
│ ├── processors
|
|
│ └── schedulers
|
|
│ ├── gaussian_ddpm.py
|
|
├── tests
|
|
│ └── test_modeling_utils.py
|
|
```
|