diffusers/README.md

# Diffusers

## Definitions

**Models**: Single neural network that models p_θ(x_t-1|x_t) and is trained to “denoise” to image
*Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet*

![model_diff_1_50](https://user-images.githubusercontent.com/23423619/171610307-dab0cd8b-75da-4d4e-9f5a-5922072e2bb5.png)

**Samplers**: Algorithm to *train* and *sample* from **Model**. Defines alpha and beta schedule, timesteps, etc..
*Example: Vanilla DDPM, DDIM, PMLS, DEIN*

![sampling](https://user-images.githubusercontent.com/23423619/171608981-3ad05953-a684-4c82-89f8-62a459147a07.png)
![training](https://user-images.githubusercontent.com/23423619/171608964-b3260cce-e6b4-4841-959d-7d8ba4b8d1b2.png)

**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, CLIP
*Example: GLIDE,CompVis/Latent-Diffusion, Imagen, DALL-E*

![imagen](https://user-images.githubusercontent.com/23423619/171609001-c3f2c1c9-f597-4a16-9843-749bf3f9431c.png)

## 1. `diffusers` as a central modular diffusion and sampler library

`diffusers` should be more modularized than `transformers` so that parts of it can be easily used in other libraries.
It could become a central place for all kinds of models, schedulers, training utils and processors required when using diffusion models in audio, vision, ... 
One should be able to save both models and samplers as well as load them from the Hub.

Example:

```python
import torch
from diffusers import UNetModel, GaussianDDPMScheduler
import PIL
import numpy as np

generator = torch.Generator()
generator = generator.manual_seed(6694729458485568)
torch_device = "cuda" if torch.cuda.is_available() else "cpu"

# 1. Load models
scheduler = GaussianDDPMScheduler.from_config("fusing/ddpm-lsun-church")
model = UNetModel.from_pretrained("fusing/ddpm-lsun-church").to(torch_device)

# 2. Sample gaussian noise
image = scheduler.sample_noise((1, model.in_channels, model.resolution, model.resolution), device=torch_device, generator=generator)

# 3. Denoise                                                                                                                                           
for t in reversed(range(len(scheduler))):
    # i) define coefficients for time step t
    clipped_image_coeff = 1 / torch.sqrt(scheduler.get_alpha_prod(t))
    clipped_noise_coeff = torch.sqrt(1 / scheduler.get_alpha_prod(t) - 1)
    image_coeff = (1 - scheduler.get_alpha_prod(t - 1)) * torch.sqrt(scheduler.get_alpha(t)) / (1 - scheduler.get_alpha_prod(t))
    clipped_coeff = torch.sqrt(scheduler.get_alpha_prod(t - 1)) * scheduler.get_beta(t) / (1 - scheduler.get_alpha_prod(t))

    # ii) predict noise residual
    with torch.no_grad():
        noise_residual = model(image, t)

    # iii) compute predicted image from residual
    # See 2nd formula at https://github.com/hojonathanho/diffusion/issues/5#issue-896554416 for comparison
    pred_mean = clipped_image_coeff * image - clipped_noise_coeff * noise_residual
    pred_mean = torch.clamp(pred_mean, -1, 1)
    prev_image = clipped_coeff * pred_mean + image_coeff * image

    # iv) sample variance
    prev_variance = scheduler.sample_variance(t, prev_image.shape, device=torch_device, generator=generator)

    # v) sample  x_{t-1} ~ N(prev_image, prev_variance)
    sampled_prev_image = prev_image + prev_variance
    image = sampled_prev_image

# process image to PIL
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])

# save image
image_pil.save("test.png")
```

## 2. `diffusers` as a collection of most import Diffusion models (GLIDE, Dalle, ...)
`models` directory in repository hosts complete diffusion training code & pipelines. Easily load & saveable from the Hub. Will be possible to use just from pip `diffusers` version:

Example:

```python
from diffusers import DiffusionPipeline
import PIL.Image
import numpy as np

# load model and scheduler
ddpm = DiffusionPipeline.from_pretrained("fusing/ddpm-lsun-bedroom")

# run pipeline in inference (sample random noise and denoise)
image = ddpm()

# process image to PIL
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])

# save image
image_pil.save("test.png")
```

## Library structure:

```
├── models
│   ├── audio
│   │   └── fastdiff
│   │       ├── modeling_fastdiff.py
│   │       ├── README.md
│   │       └── run_fastdiff.py
│   ├── __init__.py
│   └── vision
│       ├── dalle2
│       │   ├── modeling_dalle2.py
│       │   ├── README.md
│       │   └── run_dalle2.py
│       ├── ddpm
│       │   ├── example.py
│       │   ├── modeling_ddpm.py
│       │   ├── README.md
│       │   └── run_ddpm.py
│       ├── glide
│       │   ├── modeling_glide.py
│       │   ├── modeling_vqvae.py.py
│       │   ├── README.md
│       │   └── run_glide.py
│       ├── imagen
│       │   ├── modeling_dalle2.py
│       │   ├── README.md
│       │   └── run_dalle2.py
│       ├── __init__.py
│       └── latent_diffusion
│           ├── modeling_latent_diffusion.py
│           ├── README.md
│           └── run_latent_diffusion.py
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── src
│   └── diffusers
│       ├── configuration_utils.py
│       ├── __init__.py
│       ├── modeling_utils.py
│       ├── models
│       │   ├── __init__.py
│       │   ├── unet_glide.py
│       │   └── unet.py
│       ├── pipeline_utils.py
│       └── schedulers
│           ├── gaussian_ddpm.py
│           ├── __init__.py
├── tests
│   └── test_modeling_utils.py
```
Update README.md 2022-06-01 16:42:08 -06:00			`# Diffusers`

Update README.md 2022-06-02 04:27:01 -06:00			`## Definitions`
Update README.md 2022-06-02 04:15:59 -06:00
Update README.md 2022-06-02 04:27:01 -06:00			`Models: Single neural network that models p_θ(x_t-1\|x_t) and is trained to “denoise” to image`
			`Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet`

			`![model_diff_1_50](https://user-images.githubusercontent.com/23423619/171610307-dab0cd8b-75da-4d4e-9f5a-5922072e2bb5.png)`

			`Samplers: Algorithm to train and sample from Model. Defines alpha and beta schedule, timesteps, etc..`
			`Example: Vanilla DDPM, DDIM, PMLS, DEIN`

			`![sampling](https://user-images.githubusercontent.com/23423619/171608981-3ad05953-a684-4c82-89f8-62a459147a07.png)`
			`![training](https://user-images.githubusercontent.com/23423619/171608964-b3260cce-e6b4-4841-959d-7d8ba4b8d1b2.png)`

			`Diffusion Pipeline: End-to-end pipeline that includes multiple diffusion models, possible text encoders, CLIP`
			`Example: GLIDE,CompVis/Latent-Diffusion, Imagen, DALL-E`

			`![imagen](https://user-images.githubusercontent.com/23423619/171609001-c3f2c1c9-f597-4a16-9843-749bf3f9431c.png)`
Update README.md 2022-06-02 04:15:59 -06:00
Update README.md 2022-06-02 07:59:58 -06:00			## 1. `diffusers` as a central modular diffusion and sampler library

			`diffusers` should be more modularized than `transformers` so that parts of it can be easily used in other libraries.
Update README.md 2022-06-06 11:36:24 -06:00			`It could become a central place for all kinds of models, schedulers, training utils and processors required when using diffusion models in audio, vision, ...`
Update README.md 2022-06-02 07:59:58 -06:00			`One should be able to save both models and samplers as well as load them from the Hub.`

			`Example:`

			```python
			`import torch`
Update README.md 2022-06-06 09:43:36 -06:00			`from diffusers import UNetModel, GaussianDDPMScheduler`
			`import PIL`
			`import numpy as np`

			`generator = torch.Generator()`
			`generator = generator.manual_seed(6694729458485568)`
Update README.md 2022-06-07 07:13:39 -06:00			`torch_device = "cuda" if torch.cuda.is_available() else "cpu"`
Update README.md 2022-06-06 09:43:36 -06:00
			`# 1. Load models`
			`scheduler = GaussianDDPMScheduler.from_config("fusing/ddpm-lsun-church")`
			`model = UNetModel.from_pretrained("fusing/ddpm-lsun-church").to(torch_device)`

			`# 2. Sample gaussian noise`
			`image = scheduler.sample_noise((1, model.in_channels, model.resolution, model.resolution), device=torch_device, generator=generator)`

			`# 3. Denoise`
			`for t in reversed(range(len(scheduler))):`
			`# i) define coefficients for time step t`
clip => clipped 2022-06-07 08:34:44 -06:00			`clipped_image_coeff = 1 / torch.sqrt(scheduler.get_alpha_prod(t))`
			`clipped_noise_coeff = torch.sqrt(1 / scheduler.get_alpha_prod(t) - 1)`
Update README.md 2022-06-06 09:43:36 -06:00			`image_coeff = (1 - scheduler.get_alpha_prod(t - 1)) * torch.sqrt(scheduler.get_alpha(t)) / (1 - scheduler.get_alpha_prod(t))`
clip => clipped 2022-06-07 08:34:44 -06:00			`clipped_coeff = torch.sqrt(scheduler.get_alpha_prod(t - 1)) * scheduler.get_beta(t) / (1 - scheduler.get_alpha_prod(t))`
Update README.md 2022-06-06 09:43:36 -06:00
			`# ii) predict noise residual`
			`with torch.no_grad():`
			`noise_residual = model(image, t)`

			`# iii) compute predicted image from residual`
			`# See 2nd formula at https://github.com/hojonathanho/diffusion/issues/5#issue-896554416 for comparison`
clip => clipped 2022-06-07 08:34:44 -06:00			`pred_mean = clipped_image_coeff * image - clipped_noise_coeff * noise_residual`
Update README.md 2022-06-06 09:43:36 -06:00			`pred_mean = torch.clamp(pred_mean, -1, 1)`
clip => clipped 2022-06-07 08:34:44 -06:00			`prev_image = clipped_coeff * pred_mean + image_coeff * image`
Update README.md 2022-06-06 09:43:36 -06:00
			`# iv) sample variance`
			`prev_variance = scheduler.sample_variance(t, prev_image.shape, device=torch_device, generator=generator)`

			`# v) sample x_{t-1} ~ N(prev_image, prev_variance)`
			`sampled_prev_image = prev_image + prev_variance`
			`image = sampled_prev_image`

Update README.md 2022-06-06 10:19:02 -06:00			`# process image to PIL`
Update README.md 2022-06-06 09:43:36 -06:00			`image_processed = image.cpu().permute(0, 2, 3, 1)`
			`image_processed = (image_processed + 1.0) * 127.5`
			`image_processed = image_processed.numpy().astype(np.uint8)`
			`image_pil = PIL.Image.fromarray(image_processed[0])`
Update README.md 2022-06-06 10:19:02 -06:00
			`# save image`
Update README.md 2022-06-06 09:43:36 -06:00			`image_pil.save("test.png")`
Update README.md 2022-06-02 07:59:58 -06:00			```

			## 2. `diffusers` as a collection of most import Diffusion models (GLIDE, Dalle, ...)
			`models` directory in repository hosts complete diffusion training code & pipelines. Easily load & saveable from the Hub. Will be possible to use just from pip `diffusers` version:

			`Example:`

			```python
update pipeline example 2022-06-07 07:51:48 -06:00			`from diffusers import DiffusionPipeline`
Update README.md 2022-06-06 10:19:02 -06:00			`import PIL.Image`
			`import numpy as np`
Update README.md 2022-06-02 07:59:58 -06:00
Update README.md 2022-06-06 10:19:02 -06:00			`# load model and scheduler`
update pipeline example 2022-06-07 07:51:48 -06:00			`ddpm = DiffusionPipeline.from_pretrained("fusing/ddpm-lsun-bedroom")`
Update README.md 2022-06-06 10:19:02 -06:00
			`# run pipeline in inference (sample random noise and denoise)`
Update README.md 2022-06-02 07:59:58 -06:00			`image = ddpm()`

Update README.md 2022-06-06 10:19:02 -06:00			`# process image to PIL`
Update README.md 2022-06-06 10:17:15 -06:00			`image_processed = image.cpu().permute(0, 2, 3, 1)`
			`image_processed = (image_processed + 1.0) * 127.5`
			`image_processed = image_processed.numpy().astype(np.uint8)`
			`image_pil = PIL.Image.fromarray(image_processed[0])`
Update README.md 2022-06-06 10:19:02 -06:00
			`# save image`
Update README.md 2022-06-06 10:17:15 -06:00			`image_pil.save("test.png")`
Update README.md 2022-06-02 07:59:58 -06:00			```

Update README.md 2022-06-01 16:42:08 -06:00			`## Library structure:`

			```
			`├── models`
Update README.md 2022-06-01 16:50:23 -06:00			`│ ├── audio`
			`│ │ └── fastdiff`
			`│ │ ├── modeling_fastdiff.py`
			`│ │ ├── README.md`
			`│ │ └── run_fastdiff.py`
Update README.md 2022-06-07 08:58:19 -06:00			`│ ├── __init__.py`
Update README.md 2022-06-01 16:50:23 -06:00			`│ └── vision`
			`│ ├── dalle2`
			`│ │ ├── modeling_dalle2.py`
			`│ │ ├── README.md`
			`│ │ └── run_dalle2.py`
			`│ ├── ddpm`
Update README.md 2022-06-07 08:58:19 -06:00			`│ │ ├── example.py`
Update README.md 2022-06-01 16:50:23 -06:00			`│ │ ├── modeling_ddpm.py`
			`│ │ ├── README.md`
			`│ │ └── run_ddpm.py`
			`│ ├── glide`
			`│ │ ├── modeling_glide.py`
Update README.md 2022-06-07 08:58:19 -06:00			`│ │ ├── modeling_vqvae.py.py`
Update README.md 2022-06-01 16:50:23 -06:00			`│ │ ├── README.md`
Update README.md 2022-06-07 08:58:19 -06:00			`│ │ └── run_glide.py`
Update README.md 2022-06-01 16:50:23 -06:00			`│ ├── imagen`
			`│ │ ├── modeling_dalle2.py`
			`│ │ ├── README.md`
			`│ │ └── run_dalle2.py`
Update README.md 2022-06-07 08:58:19 -06:00			`│ ├── __init__.py`
Update README.md 2022-06-01 16:50:23 -06:00			`│ └── latent_diffusion`
			`│ ├── modeling_latent_diffusion.py`
			`│ ├── README.md`
			`│ └── run_latent_diffusion.py`
Update README.md 2022-06-07 08:58:19 -06:00			`├── pyproject.toml`
			`├── README.md`
			`├── setup.cfg`
			`├── setup.py`
Update README.md 2022-06-01 16:42:08 -06:00			`├── src`
			`│ └── diffusers`
			`│ ├── configuration_utils.py`
			`│ ├── __init__.py`
			`│ ├── modeling_utils.py`
			`│ ├── models`
Update README.md 2022-06-07 08:58:19 -06:00			`│ │ ├── __init__.py`
			`│ │ ├── unet_glide.py`
Update README.md 2022-06-01 16:42:08 -06:00			`│ │ └── unet.py`
Update README.md 2022-06-07 08:58:19 -06:00			`│ ├── pipeline_utils.py`
Update README.md 2022-06-06 09:43:36 -06:00			`│ └── schedulers`
			`│ ├── gaussian_ddpm.py`
Update README.md 2022-06-07 08:58:19 -06:00			`│ ├── __init__.py`
Update README.md 2022-06-01 16:42:08 -06:00			`├── tests`
			`│ └── test_modeling_utils.py`
			```