waifu-diffusion/README.md



# Waifu Diffusion

Waifu Diffusion is the name for this project of finetuning Stable Diffusion on Danbooru images.

<img src=https://cdn.discordapp.com/attachments/872361510133981234/1016022078635388979/unknown.png?3867929 width=40% height=40%>

<sub>Prompt: touhou 1girl komeiji_koishi portrait</sub>

## Documentation
[Training Guide](https://github.com/harubaru/waifu-diffusion/docs/en/README.md)

All thanks goes to CompVis and Stability AI for releasing this codebase!

Model Link: https://huggingface.co/hakurei/waifu-diffusion

### Any questions? Come hop on by to our Discord server!

[![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/Sx6Spmsgx7)

# Stable Diffusion
*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:*

[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)<br/>
[Robin Rombach](https://github.com/rromb)\*,
[Andreas Blattmann](https://github.com/ablattmann)\*,
[Dominik Lorenz](https://github.com/qp-qp)\,
[Patrick Esser](https://github.com/pesser),
[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>

**CVPR '22 Oral**

which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF at [arXiv](https://arxiv.org/abs/2112.10752). Please also visit our [Project page](https://ommer-lab.com/research/latent-diffusion-models/).

![txt2img-stable2](assets/stable-samples/txt2img/merged-0006.png)
[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
model.
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. 
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487), 
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).

  
## Requirements
A suitable [conda](https://conda.io/) environment named `ldm` can be created
and activated with:

```
conda env create -f environment.yaml
conda activate ldm
```

You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running

```
conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2
pip install -e .
``` 


## Stable Diffusion v1

Stable Diffusion v1 refers to a specific configuration of the model
architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and 
then finetuned on 512x512 images.

*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
in its training data. 
Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).
Research into the safe deployment of general text-to-image models is an ongoing effort. To prevent misuse and harm, we currently provide access to the checkpoints only for [academic research purposes upon request](https://stability.ai/academia-access-form).
**This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations.***

[Request access to Stable Diffusion v1 checkpoints for academic research](https://stability.ai/academia-access-form) 

### Weights

We currently provide three checkpoints, `sd-v1-1.ckpt`, `sd-v1-2.ckpt` and `sd-v1-3.ckpt`,
which were trained as follows,

- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
  194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
  515k steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-improved-aesthetics" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).

Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
steps show the relative improvements of the checkpoints:
![sd evaluation results](assets/v1-variants-scores.jpg)


### Text-to-Image with Stable Diffusion
![txt2img-stable2](assets/stable-samples/txt2img/merged-0005.png)
![txt2img-stable2](assets/stable-samples/txt2img/merged-0007.png)

Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.


#### Sampling Script

After [obtaining the weights](#weights), link them
```
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 
```
and sample with
```
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 
```

By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler, 
and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type `python scripts/txt2img.py --help`).

```commandline
usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA] [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS]
                  [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT] [--seed SEED] [--precision {full,autocast}]

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --outdir [OUTDIR]     dir to write results to
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save individual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --plms                use plms sampling
  --laion400m           uses the LAION400M model
  --fixed_code          if enabled, uses the same starting code across samples
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --H H                 image height, in pixel space
  --W W                 image width, in pixel space
  --C C                 latent channels
  --f F                 downsampling factor
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a. batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --config CONFIG       path to config which constructs model
  --ckpt CKPT           path to checkpoint of model
  --seed SEED           the seed (for reproducible sampling)
  --precision {full,autocast}
                        evaluate at this precision

```
Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints. 
For this reason `use_ema=False` is set in the configuration, otherwise the code will try to switch from
non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints
which contain both types of weights. For these, `use_ema=False` will load and use the non-EMA weights.


#### Diffusers Integration

Another way to download and sample Stable Diffusion is by using the [diffusers library](https://github.com/huggingface/diffusers/tree/main#new--stable-diffusion-is-now-fully-compatible-with-diffusers)
```py
# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

pipe = StableDiffusionPipeline.from_pretrained(
	"CompVis/stable-diffusion-v1-3-diffusers", 
	use_auth_token=True
)

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt)["sample"][0]  
    
image.save("astronaut_rides_horse.png")
```


### Image Modification with Stable Diffusion

By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different 
tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script, 
we provide a script to perform image modification with Stable Diffusion.  

The following describes an example where a rough sketch made in [Pinta](https://www.pinta-project.com/) is converted into a detailed artwork.
```
python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8
```
Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. 
Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.

**Input**

![sketch-in](assets/stable-samples/img2img/sketch-mountains-input.jpg)

**Outputs**

![out3](assets/stable-samples/img2img/mountains-3.png)
![out2](assets/stable-samples/img2img/mountains-2.png)

This procedure can, for example, also be used to upscale samples from the base model.


## Comments 

- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch). 
Thanks for open-sourcing!

- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories). 


## BibTeX

```
@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

```
Update readme 2022-08-18 15:54:01 -06:00

Update README 2022-09-04 23:33:31 -06:00			`# Waifu Diffusion`

			`Waifu Diffusion is the name for this project of finetuning Stable Diffusion on Danbooru images.`
Update README 2022-09-04 23:31:40 -06:00
Update README 2022-09-04 23:33:31 -06:00			`<img src=https://cdn.discordapp.com/attachments/872361510133981234/1016022078635388979/unknown.png?3867929 width=40% height=40%>`

			`<sub>Prompt: touhou 1girl komeiji_koishi portrait</sub>`
Update README 2022-09-04 23:31:40 -06:00
Added docs and script to extract from danbooru JSON Will continue documentation once my instance goes back online 2022-09-05 18:16:48 -06:00			`## Documentation`
fix formatting 2022-09-05 19:34:23 -06:00			`[Training Guide](https://github.com/harubaru/waifu-diffusion/docs/en/README.md)`
Update README 2022-09-04 23:31:40 -06:00
			`All thanks goes to CompVis and Stability AI for releasing this codebase!`
Update readme 2022-08-18 15:54:01 -06:00
Update README 2022-09-04 23:33:31 -06:00			`Model Link: https://huggingface.co/hakurei/waifu-diffusion`

			`### Any questions? Come hop on by to our Discord server!`

			`[![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/Sx6Spmsgx7)`

stable diffusion 2022-08-10 08:30:49 -06:00			`# Stable Diffusion`
			`Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:`
add autoencoder training details, arxiv link and figures 2021-12-22 03:16:26 -07:00
update links in README.md 2022-08-18 05:49:59 -06:00			`[High-Resolution Image Synthesis with Latent Diffusion Models](https://ommer-lab.com/research/latent-diffusion-models/)<br/>`
add autoencoder training details, arxiv link and figures 2021-12-22 03:16:26 -07:00			`[Robin Rombach](https://github.com/rromb)\*,`
			`[Andreas Blattmann](https://github.com/ablattmann)\*,`
			`[Dominik Lorenz](https://github.com/qp-qp)\,`
			`[Patrick Esser](https://github.com/pesser),`
			`[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>`

update links in README.md 2022-08-18 05:49:59 -06:00			`CVPR '22 Oral`

Update README.md 2022-08-18 07:46:44 -06:00			`which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF at [arXiv](https://arxiv.org/abs/2112.10752). Please also visit our [Project page](https://ommer-lab.com/research/latent-diffusion-models/).`
stable diffusion 2022-08-10 08:30:49 -06:00
			`![txt2img-stable2](assets/stable-samples/txt2img/merged-0006.png)`
			`[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion`
			`model.`
			`Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.`
			`Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),`
			`this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.`
			`With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.`
			`See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).`
add code 2021-12-20 19:23:41 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00
add code 2021-12-20 19:23:41 -07:00			`## Requirements`
			A suitable [conda](https://conda.io/) environment named `ldm` can be created
			`and activated with:`

			```
			`conda env create -f environment.yaml`
			`conda activate ldm`
			```

stable diffusion 2022-08-10 08:30:49 -06:00			`You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running`
add code 2021-12-20 19:23:41 -07:00
			```
stable diffusion 2022-08-10 08:30:49 -06:00			`conda install pytorch torchvision -c pytorch`
			`pip install transformers==4.19.2`
			`pip install -e .`
			```
add code 2021-12-20 19:23:41 -07:00
add autoencoder training details, arxiv link and figures 2021-12-22 03:16:26 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`## Stable Diffusion v1`
add autoencoder training details, arxiv link and figures 2021-12-22 03:16:26 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`Stable Diffusion v1 refers to a specific configuration of the model`
			`architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet`
			`and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and`
			`then finetuned on 512x512 images.`
add code 2021-12-20 19:23:41 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present`
			`in its training data.`
			`Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).`
add link to access request form 2022-08-10 08:59:50 -06:00			`Research into the safe deployment of general text-to-image models is an ongoing effort. To prevent misuse and harm, we currently provide access to the checkpoints only for [academic research purposes upon request](https://stability.ai/academia-access-form).`
stable diffusion 2022-08-10 08:30:49 -06:00			`This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations.*`
add code 2021-12-20 19:23:41 -07:00
add link to access request form 2022-08-10 08:59:50 -06:00			`[Request access to Stable Diffusion v1 checkpoints for academic research](https://stability.ai/academia-access-form)`
add code 2021-12-20 19:23:41 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`### Weights`
add code 2021-12-20 19:23:41 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			We currently provide three checkpoints, `sd-v1-1.ckpt`, `sd-v1-2.ckpt` and `sd-v1-3.ckpt`,
			`which were trained as follows,`
add code 2021-12-20 19:23:41 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
			194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
			- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
			515k steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
			filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
			- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-improved-aesthetics" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
add code 2021-12-20 19:23:41 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,`
			`5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling`
			`steps show the relative improvements of the checkpoints:`
			`![sd evaluation results](assets/v1-variants-scores.jpg)`
add code 2021-12-20 19:23:41 -07:00


stable diffusion 2022-08-10 08:30:49 -06:00			`### Text-to-Image with Stable Diffusion`
			`![txt2img-stable2](assets/stable-samples/txt2img/merged-0005.png)`
			`![txt2img-stable2](assets/stable-samples/txt2img/merged-0007.png)`
add code 2021-12-20 19:23:41 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.`
add code 2021-12-20 19:23:41 -07:00
update readme 2022-08-16 15:13:39 -06:00
			`#### Sampling Script`

stable diffusion 2022-08-10 08:30:49 -06:00			`After [obtaining the weights](#weights), link them`
add inpainting model 2021-12-21 04:35:45 -07:00			```
stable diffusion 2022-08-10 08:30:49 -06:00			`mkdir -p models/ldm/stable-diffusion-v1/`
			`ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt`
add inpainting model 2021-12-21 04:35:45 -07:00			```
			`and sample with`
			```
stable diffusion 2022-08-10 08:30:49 -06:00			`python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms`
add inpainting model 2021-12-21 04:35:45 -07:00			```
Add diffusers as a way to inference in the model 2022-08-16 13:08:28 -06:00
stable diffusion 2022-08-10 08:30:49 -06:00			By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler,
			and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type `python scripts/txt2img.py --help`).

			```commandline
			`usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA] [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS]`
			`[--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT] [--seed SEED] [--precision {full,autocast}]`

			`optional arguments:`
			`-h, --help show this help message and exit`
			`--prompt [PROMPT] the prompt to render`
			`--outdir [OUTDIR] dir to write results to`
			`--skip_grid do not save a grid, only individual samples. Helpful when evaluating lots of samples`
			`--skip_save do not save individual samples. For speed measurements.`
			`--ddim_steps DDIM_STEPS`
			`number of ddim sampling steps`
			`--plms use plms sampling`
			`--laion400m uses the LAION400M model`
			`--fixed_code if enabled, uses the same starting code across samples`
			`--ddim_eta DDIM_ETA ddim eta (eta=0.0 corresponds to deterministic sampling`
			`--n_iter N_ITER sample this often`
			`--H H image height, in pixel space`
			`--W W image width, in pixel space`
			`--C C latent channels`
			`--f F downsampling factor`
			`--n_samples N_SAMPLES`
			`how many samples to produce for each given prompt. A.k.a. batch size`
			`--n_rows N_ROWS rows in the grid (default: n_samples)`
			`--scale SCALE unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))`
			`--from-file FROM_FILE`
			`if specified, load prompts from this file`
			`--config CONFIG path to config which constructs model`
			`--ckpt CKPT path to checkpoint of model`
			`--seed SEED the seed (for reproducible sampling)`
			`--precision {full,autocast}`
			`evaluate at this precision`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00
			```
stable diffusion 2022-08-10 08:30:49 -06:00			`Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints.`
			For this reason `use_ema=False` is set in the configuration, otherwise the code will try to switch from
			`non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints`
			which contain both types of weights. For these, `use_ema=False` will load and use the non-EMA weights.
update readme 2022-08-16 15:13:39 -06:00

			`#### Diffusers Integration`

			`Another way to download and sample Stable Diffusion is by using the [diffusers library](https://github.com/huggingface/diffusers/tree/main#new--stable-diffusion-is-now-fully-compatible-with-diffusers)`
			```py
			# make sure you're logged in with `huggingface-cli login`
			`from torch import autocast`
			`from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler`

			`pipe = StableDiffusionPipeline.from_pretrained(`
			`"CompVis/stable-diffusion-v1-3-diffusers",`
			`use_auth_token=True`
			`)`

			`prompt = "a photo of an astronaut riding a horse on mars"`
			`with autocast("cuda"):`
			`image = pipe(prompt)["sample"][0]`

			`image.save("astronaut_rides_horse.png")`
			```

add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00

stable diffusion 2022-08-10 08:30:49 -06:00			`### Image Modification with Stable Diffusion`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different`
			`tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script,`
			`we provide a script to perform image modification with Stable Diffusion.`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`The following describes an example where a rough sketch made in [Pinta](https://www.pinta-project.com/) is converted into a detailed artwork.`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00			```
stable diffusion 2022-08-10 08:30:49 -06:00			`python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00			```
stable diffusion 2022-08-10 08:30:49 -06:00			`Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image.`
			`Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`Input`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`![sketch-in](assets/stable-samples/img2img/sketch-mountains-input.jpg)`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`Outputs`
add configs for training unconditional/class-conditional ldms 2021-12-22 07:57:23 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`![out3](assets/stable-samples/img2img/mountains-3.png)`
			`![out2](assets/stable-samples/img2img/mountains-2.png)`
add inpainting model 2021-12-21 04:35:45 -07:00
stable diffusion 2022-08-10 08:30:49 -06:00			`This procedure can, for example, also be used to upscale samples from the base model.`
Update README.md 2021-12-20 19:38:17 -07:00

add code 2021-12-20 19:23:41 -07:00			`## Comments`

stable diffusion 2022-08-10 08:30:49 -06:00			`- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)`
add code 2021-12-20 19:23:41 -07:00			`and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).`
			`Thanks for open-sourcing!`

			`- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).`


add autoencoder training details, arxiv link and figures 2021-12-22 03:16:26 -07:00			`## BibTeX`

			```
			`@misc{rombach2021highresolution,`
			`title={High-Resolution Image Synthesis with Latent Diffusion Models},`
			`author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},`
			`year={2021},`
			`eprint={2112.10752},`
			`archivePrefix={arXiv},`
			`primaryClass={cs.CV}`
			`}`
stable diffusion 2022-08-10 08:30:49 -06:00
add autoencoder training details, arxiv link and figures 2021-12-22 03:16:26 -07:00			```

Create README.md 2021-12-20 17:59:06 -07:00