diffusers/docs/source/index.mdx

<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

<p align="center">
    <br>
    <img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400"/>
    <br>
</p>

# 🧨 Diffusers

🤗 Diffusers provides pretrained vision diffusion models, and serves as a modular toolbox for inference and training.

More precisely, 🤗 Diffusers offers:

- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [**Using Diffusers**](./using-diffusers/conditional_image_generation)) or have a look at [**Pipelines**](#pipelines) to get an overview of all supported pipelines and their corresponding papers.
- Various noise schedulers that can be used interchangeably for the preferred speed vs. quality trade-off in inference. For more information see [**Schedulers**](./api/schedulers).
- Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system. See [**Models**](./api/models) for more details 
- Training examples to show how to train the most popular diffusion model tasks. For more information see [**Training**](./training/overview).

## 🧨 Diffusers Pipelines

The following table summarizes all officially supported pipelines, their corresponding paper, and if 
available a colab notebook to directly try them out.

| Pipeline | Paper | Tasks | Colab
|---|---|:---:|:---:|
| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation |
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
| [ddim](./api/pipelines/ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation |
| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation | 
| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image | 
| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | 
| [pndm](./api/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation | 
| [score_sde_ve](./api/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | 
| [score_sde_vp](./api/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | 
| [stable_diffusion](./api/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [stable_diffusion](./api/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
| [stable_diffusion](./api/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation | 
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation | 

**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers.
Docs (#45) * first pass at docs structure * minor reformatting, add github actions for docs * populate docs (primarily from README, some writing) 2022-07-13 09:42:05 -06:00			`<!--Copyright 2022 The HuggingFace Team. All rights reserved.`

			`Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with`
			`the License. You may obtain a copy of the License at`

			`http://www.apache.org/licenses/LICENSE-2.0`

			`Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on`
			`an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the`
			`specific language governing permissions and limitations under the License.`
			`-->`

			`<p align="center">`
			`<br>`
			`<img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400"/>`
			`<br>`
			`</p>`

			`# 🧨 Diffusers`

[Docs] Finish Intro Section (#402) * up * up * finish 2022-09-07 10:00:49 -06:00			`🤗 Diffusers provides pretrained vision diffusion models, and serves as a modular toolbox for inference and training.`
Docs (#45) * first pass at docs structure * minor reformatting, add github actions for docs * populate docs (primarily from README, some writing) 2022-07-13 09:42:05 -06:00
			`More precisely, 🤗 Diffusers offers:`

[Docs] Finish Intro Section (#402) * up * up * finish 2022-09-07 10:00:49 -06:00			`- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [Using Diffusers](./using-diffusers/conditional_image_generation)) or have a look at [Pipelines](#pipelines) to get an overview of all supported pipelines and their corresponding papers.`
Docs fix some typos (#408) * fix small typos * capitalize Diffusers 2022-09-08 01:08:35 -06:00			`- Various noise schedulers that can be used interchangeably for the preferred speed vs. quality trade-off in inference. For more information see [Schedulers](./api/schedulers).`
[Docs] Finish Intro Section (#402) * up * up * finish 2022-09-07 10:00:49 -06:00			`- Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system. See [Models](./api/models) for more details`
			`- Training examples to show how to train the most popular diffusion model tasks. For more information see [Training](./training/overview).`

			`## 🧨 Diffusers Pipelines`

			`The following table summarizes all officially supported pipelines, their corresponding paper, and if`
			`available a colab notebook to directly try them out.`

			`\| Pipeline \| Paper \| Tasks \| Colab`
			`\|---\|---\|:---:\|:---:\|`
Add AltDiffusion (#1299) * add conversion script for vae * up * up * some fixes * add text model * use the correct config * add docs * move model in it's own file * move model in its own file * pass attenion mask to text encoder * pass attn mask to uncond inputs * quality * fix image2image * add imag2image in init * fix import * fix one more import * fix import, dummy objetcs * fix copied from * up * finish Co-authored-by: patil-suraj <surajp815@gmail.com> 2022-11-15 13:32:26 -07:00			`\| [alt_diffusion](./api/pipelines/alt_diffusion) \| [AltDiffusion](https://arxiv.org/abs/2211.06679) \| Image-to-Image Text-Guided Generation \|`
Add CycleDiffusion pipeline using Stable Diffusion (#888) * Add CycleDiffusion pipeline for Stable Diffusion * Add the option of passing noise to DDIMScheduler Add the option of providing the noise itself to DDIMScheduler, instead of the random seed generator. * Update README.md * Update README.md * Update pipeline_stable_diffusion_cycle_diffusion.py * Update pipeline_stable_diffusion_cycle_diffusion.py * Update pipeline_stable_diffusion_cycle_diffusion.py * Update pipeline_stable_diffusion_cycle_diffusion.py * Update scheduling_ddim.py * Update import format * Update pipeline_stable_diffusion_cycle_diffusion.py * Update scheduling_ddim.py * Update src/diffusers/schedulers/scheduling_ddim.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_ddim.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_ddim.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_ddim.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/diffusers/schedulers/scheduling_ddim.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update scheduling_ddim.py * Update scheduling_ddim.py * Update scheduling_ddim.py * add two tests * Update pipeline_stable_diffusion_cycle_diffusion.py * Update pipeline_stable_diffusion_cycle_diffusion.py * Update README.md * Rename pipeline name as suggested in the latest reviewer comment * Update test_pipelines.py * Update test_pipelines.py * Update test_pipelines.py * Update pipeline_stable_diffusion_cycle_diffusion.py * Remove the generator This generator does not control all randomness during sampling, which can be misleading. * Update optimal hyperparameters * Update src/diffusers/pipelines/stable_diffusion/README.md Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/diffusers/pipelines/stable_diffusion/README.md Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/diffusers/pipelines/stable_diffusion/README.md Co-authored-by: Suraj Patil <surajp815@gmail.com> * Apply suggestions from code review * uP * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_cycle_diffusion.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * up * up * Replace assert with ValueError * finish docs Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> 2022-11-04 13:51:06 -06:00			`\| [cycle_diffusion](./api/pipelines/cycle_diffusion) \| [Cycle Diffusion](https://arxiv.org/abs/2210.05559) \| Image-to-Image Text-Guided Generation \|`
VQ-diffusion (#658) * Changes for VQ-diffusion VQVAE Add specify dimension of embeddings to VQModel: `VQModel` will by default set the dimension of embeddings to the number of latent channels. The VQ-diffusion VQVAE has a smaller embedding dimension, 128, than number of latent channels, 256. Add AttnDownEncoderBlock2D and AttnUpDecoderBlock2D to the up and down unet block helpers. VQ-diffusion's VQVAE uses those two block types. * Changes for VQ-diffusion transformer Modify attention.py so SpatialTransformer can be used for VQ-diffusion's transformer. SpatialTransformer: - Can now operate over discrete inputs (classes of vector embeddings) as well as continuous. - `in_channels` was made optional in the constructor so two locations where it was passed as a positional arg were moved to kwargs - modified forward pass to take optional timestep embeddings ImagePositionalEmbeddings: - added to provide positional embeddings to discrete inputs for latent pixels BasicTransformerBlock: - norm layers were made configurable so that the VQ-diffusion could use AdaLayerNorm with timestep embeddings - modified forward pass to take optional timestep embeddings CrossAttention: - now may optionally take a bias parameter for its query, key, and value linear layers FeedForward: - Internal layers are now configurable ApproximateGELU: - Activation function in VQ-diffusion's feedforward layer AdaLayerNorm: - Norm layer modified to incorporate timestep embeddings * Add VQ-diffusion scheduler * Add VQ-diffusion pipeline * Add VQ-diffusion convert script to diffusers * Add VQ-diffusion dummy objects * Add VQ-diffusion markdown docs * Add VQ-diffusion tests * some renaming * some fixes * more renaming * correct * fix typo * correct weights * finalize * fix tests * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * finish * finish * up Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> 2022-11-03 09:10:28 -06:00			`\| [dance_diffusion](./api/pipelines/dance_diffusion) \| [Dance Diffusion](https://github.com/williamberman/diffusers.git) \| Unconditional Audio Generation \|`
[Docs] Finish Intro Section (#402) * up * up * finish 2022-09-07 10:00:49 -06:00			`\| [ddpm](./api/pipelines/ddpm) \| [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) \| Unconditional Image Generation \|`
Update index.mdx (#670) 2022-09-29 01:17:52 -06:00			`\| [ddim](./api/pipelines/ddim) \| [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) \| Unconditional Image Generation \|`
[Docs] Finish Intro Section (#402) * up * up * finish 2022-09-07 10:00:49 -06:00			`\| [latent_diffusion](./api/pipelines/latent_diffusion) \| [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)\| Text-to-Image Generation \|`
Add improved handling of pil (#1309) * Better error message for transformers dummy * [PIL] Better deprecation functionality * up 2022-11-16 07:58:22 -07:00			`\| [latent_diffusion](./api/pipelines/latent_diffusion) \| [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)\| Super Resolution Image-to-Image \|`
[Docs] Finish Intro Section (#402) * up * up * finish 2022-09-07 10:00:49 -06:00			`\| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) \| [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) \| Unconditional Image Generation \|`
			`\| [pndm](./api/pipelines/pndm) \| [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) \| Unconditional Image Generation \|`
			`\| [score_sde_ve](./api/pipelines/score_sde_ve) \| [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) \| Unconditional Image Generation \|`
			`\| [score_sde_vp](./api/pipelines/score_sde_vp) \| [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) \| Unconditional Image Generation \|`
			`\| [stable_diffusion](./api/pipelines/stable_diffusion) \| [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) \| Text-to-Image Generation \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)`
[Docs] Correct links (#432) 2022-09-08 13:29:24 -06:00			`\| [stable_diffusion](./api/pipelines/stable_diffusion) \| [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) \| Image-to-Image Text-Guided Generation \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)`
			`\| [stable_diffusion](./api/pipelines/stable_diffusion) \| [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) \| Text-Guided Image Inpainting \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)`
docs: fix `stochastic_karras_ve` ref (#618) Signed-off-by: Ryan Russell <git@ryanrussell.org> Signed-off-by: Ryan Russell <git@ryanrussell.org> 2022-09-22 10:36:29 -06:00			`\| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) \| [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) \| Unconditional Image Generation \|`
VQ-diffusion (#658) * Changes for VQ-diffusion VQVAE Add specify dimension of embeddings to VQModel: `VQModel` will by default set the dimension of embeddings to the number of latent channels. The VQ-diffusion VQVAE has a smaller embedding dimension, 128, than number of latent channels, 256. Add AttnDownEncoderBlock2D and AttnUpDecoderBlock2D to the up and down unet block helpers. VQ-diffusion's VQVAE uses those two block types. * Changes for VQ-diffusion transformer Modify attention.py so SpatialTransformer can be used for VQ-diffusion's transformer. SpatialTransformer: - Can now operate over discrete inputs (classes of vector embeddings) as well as continuous. - `in_channels` was made optional in the constructor so two locations where it was passed as a positional arg were moved to kwargs - modified forward pass to take optional timestep embeddings ImagePositionalEmbeddings: - added to provide positional embeddings to discrete inputs for latent pixels BasicTransformerBlock: - norm layers were made configurable so that the VQ-diffusion could use AdaLayerNorm with timestep embeddings - modified forward pass to take optional timestep embeddings CrossAttention: - now may optionally take a bias parameter for its query, key, and value linear layers FeedForward: - Internal layers are now configurable ApproximateGELU: - Activation function in VQ-diffusion's feedforward layer AdaLayerNorm: - Norm layer modified to incorporate timestep embeddings * Add VQ-diffusion scheduler * Add VQ-diffusion pipeline * Add VQ-diffusion convert script to diffusers * Add VQ-diffusion dummy objects * Add VQ-diffusion markdown docs * Add VQ-diffusion tests * some renaming * some fixes * more renaming * correct * fix typo * correct weights * finalize * fix tests * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * finish * finish * up Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> 2022-11-03 09:10:28 -06:00			`\| [vq_diffusion](./api/pipelines/vq_diffusion) \| [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) \| Text-to-Image Generation \|`
[Docs] Finish Intro Section (#402) * up * up * finish 2022-09-07 10:00:49 -06:00
			`Note: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers.`