diff --git a/README.md b/README.md index fe006f41..1dc1d783 100644 --- a/README.md +++ b/README.md @@ -64,44 +64,54 @@ In order to get started, we recommend taking a look at two notebooks: - The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffusion models training methods. This notebook takes a step-by-step approach to training your diffusion models on an image dataset, with explanatory graphics. -## **New** Stable Diffusion is now fully compatible with `diffusers`! +## Stable Diffusion is fully compatible with `diffusers`! -Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. +Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/) and [RunwayML](https://runwayml.com/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 4GB VRAM. See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information. -You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You have to be a registered user in ๐Ÿค— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation. +You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree. You have to be a registered user in ๐Ÿค— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation. ### Text-to-Image generation with Stable Diffusion +First let's install +```bash +pip install --upgrade diffusers transformers scipy +``` + +Run this command to log in with your HF Hub token if you haven't before (you can skip this step if you prefer to run the model locally, follow [this](#running-the-model-locally) instead) +```bash +huggingface-cli login +``` + We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full precision while being roughly twice as fast and requiring half the amount of GPU RAM. ```python -# make sure you're logged in with `huggingface-cli login` from diffusers import StableDiffusionPipeline -pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_type=torch.float16, revision="fp16") +pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_type=torch.float16, revision="fp16") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] ``` -**Note**: If you don't want to use the token, you can also simply download the model weights -(after having [accepted the license](https://huggingface.co/CompVis/stable-diffusion-v1-4)) and pass +#### Running the model locally +If you don't want to login to Hugging Face, you can also simply download the model folder +(after having [accepted the license](https://huggingface.co/runwayml/stable-diffusion-v1-5)) and pass the path to the local folder to the `StableDiffusionPipeline`. ``` git lfs install -git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 +git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 ``` -Assuming the folder is stored locally under `./stable-diffusion-v1-4`, you can also run stable diffusion +Assuming the folder is stored locally under `./stable-diffusion-v1-5`, you can also run stable diffusion without requiring an authentication token: ```python -pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4") +pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" @@ -114,7 +124,7 @@ The following snippet should result in less than 4GB VRAM. ```python pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, ) @@ -125,7 +135,7 @@ pipe.enable_attention_slicing() image = pipe(prompt).images[0] ``` -If you wish to use a different scheduler, you can simply instantiate +If you wish to use a different scheduler (e.g.: DDIM, LMS, PNDM/PLMS), you can instantiate it before the pipeline and pass it to `from_pretrained`. ```python @@ -138,7 +148,7 @@ lms = LMSDiscreteScheduler( ) pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, scheduler=lms, @@ -158,7 +168,7 @@ please run the model in the default *full-precision* setting: # make sure you're logged in with `huggingface-cli login` from diffusers import StableDiffusionPipeline -pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") +pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") # disable the following line if you run on CPU pipe = pipe.to("cuda") @@ -169,6 +179,75 @@ image = pipe(prompt).images[0] image.save("astronaut_rides_horse.png") ``` +### JAX/Flax + +To use StableDiffusion on TPUs and GPUs for faster inference you can leverage JAX/Flax. + +Running the pipeline with default PNDMScheduler + +```python +import jax +import numpy as np +from flax.jax_utils import replicate +from flax.training.common_utils import shard + +from diffusers import FlaxStableDiffusionPipeline + +pipeline, params = FlaxStableDiffusionPipeline.from_pretrained( + "runwayml/stable-diffusion-v1-5", revision="flax", dtype=jax.numpy.bfloat16 +) + +prompt = "a photo of an astronaut riding a horse on mars" + +prng_seed = jax.random.PRNGKey(0) +num_inference_steps = 50 + +num_samples = jax.device_count() +prompt = num_samples * [prompt] +prompt_ids = pipeline.prepare_inputs(prompt) + +# shard inputs and rng +params = replicate(params) +prng_seed = jax.random.split(prng_seed, jax.device_count()) +prompt_ids = shard(prompt_ids) + +images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images +images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) +``` + +**Note**: +If you are limited by TPU memory, please make sure to load the `FlaxStableDiffusionPipeline` in `bfloat16` precision instead of the default `float32` precision as done above. You can do so by telling diffusers to load the weights from "bf16" branch. + +```python +import jax +import numpy as np +from flax.jax_utils import replicate +from flax.training.common_utils import shard + +from diffusers import FlaxStableDiffusionPipeline + +pipeline, params = FlaxStableDiffusionPipeline.from_pretrained( + "runwayml/stable-diffusion-v1-5", revision="bf16", dtype=jax.numpy.bfloat16 +) + +prompt = "a photo of an astronaut riding a horse on mars" + +prng_seed = jax.random.PRNGKey(0) +num_inference_steps = 50 + +num_samples = jax.device_count() +prompt = num_samples * [prompt] +prompt_ids = pipeline.prepare_inputs(prompt) + +# shard inputs and rng +params = replicate(params) +prng_seed = jax.random.split(prng_seed, jax.device_count()) +prompt_ids = shard(prompt_ids) + +images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images +images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) +``` + ### Image-to-Image text-guided generation with Stable Diffusion The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images. @@ -183,14 +262,14 @@ from diffusers import StableDiffusionImg2ImgPipeline # load the pipeline device = "cuda" -model_id_or_path = "CompVis/stable-diffusion-v1-4" +model_id_or_path = "runwayml/stable-diffusion-v1-5" pipe = StableDiffusionImg2ImgPipeline.from_pretrained( model_id_or_path, revision="fp16", torch_dtype=torch.float16, ) -# or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 -# and pass `model_id_or_path="./stable-diffusion-v1-4"`. +# or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 +# and pass `model_id_or_path="./stable-diffusion-v1-5"`. pipe = pipe.to(device) # let's download an initial image diff --git a/docs/source/api/pipelines/overview.mdx b/docs/source/api/pipelines/overview.mdx index 3082f996..af711a02 100644 --- a/docs/source/api/pipelines/overview.mdx +++ b/docs/source/api/pipelines/overview.mdx @@ -67,8 +67,8 @@ Diffusion models often consist of multiple independently-trained models or other Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one. During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality: -- [`from_pretrained` method](../diffusion_pipeline) that accepts a Hugging Face Hub repository id, *e.g.* [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) or a path to a local directory, *e.g.* -"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [CompVis/stable-diffusion-v1-4/model_index.json](https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/model_index.json), which defines all components that should be +- [`from_pretrained` method](../diffusion_pipeline) that accepts a Hugging Face Hub repository id, *e.g.* [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or a path to a local directory, *e.g.* +"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [runwayml/stable-diffusion-v1-5/model_index.json](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), which defines all components that should be loaded into the pipelines. More specifically, for each model/component one needs to define the format `: ["", ""]`. `` is the attribute name given to the loaded instance of `` which can be found in the library or pipeline folder called `""`. - [`save_pretrained`](../diffusion_pipeline) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated @@ -100,7 +100,7 @@ logic including pre-processing, an unrolled diffusion loop, and post-processing # make sure you're logged in with `huggingface-cli login` from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler -pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") +pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" @@ -123,7 +123,7 @@ from diffusers import StableDiffusionImg2ImgPipeline # load the pipeline device = "cuda" pipe = StableDiffusionImg2ImgPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16 + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16 ).to(device) # let's download an initial image diff --git a/docs/source/optimization/fp16.mdx b/docs/source/optimization/fp16.mdx index b561aedb..d1dd87f7 100644 --- a/docs/source/optimization/fp16.mdx +++ b/docs/source/optimization/fp16.mdx @@ -56,7 +56,7 @@ If you use a CUDA GPU, you can take advantage of `torch.autocast` to perform inf from torch import autocast from diffusers import StableDiffusionPipeline -pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") +pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" @@ -72,7 +72,7 @@ To save more GPU memory and get even more speed, you can load and run the model ```Python pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, ) @@ -97,7 +97,7 @@ import torch from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, ) @@ -152,7 +152,7 @@ def generate_inputs(): pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, ).to("cuda") @@ -216,7 +216,7 @@ class UNet2DConditionOutput: pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, ).to("cuda") diff --git a/docs/source/optimization/mps.mdx b/docs/source/optimization/mps.mdx index ff9d614c..754adae5 100644 --- a/docs/source/optimization/mps.mdx +++ b/docs/source/optimization/mps.mdx @@ -31,7 +31,7 @@ We recommend to "prime" the pipeline using an additional one-time pass through i # make sure you're logged in with `huggingface-cli login` from diffusers import StableDiffusionPipeline -pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") +pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("mps") prompt = "a photo of an astronaut riding a horse on mars" diff --git a/docs/source/optimization/onnx.mdx b/docs/source/optimization/onnx.mdx index 9bbc4f20..e79efbde 100644 --- a/docs/source/optimization/onnx.mdx +++ b/docs/source/optimization/onnx.mdx @@ -28,7 +28,7 @@ The snippet below demonstrates how to use the ONNX runtime. You need to use `Sta from diffusers import StableDiffusionOnnxPipeline pipe = StableDiffusionOnnxPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", revision="onnx", provider="CUDAExecutionProvider", ) diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx index 9574ecac..a652695f 100644 --- a/docs/source/quicktour.mdx +++ b/docs/source/quicktour.mdx @@ -68,8 +68,7 @@ You can save the image by simply calling: More advanced models, like [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) require you to accept a [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) before running the model. This is due to the improved image generation capabilities of the model and the potentially harmful content that could be produced with it. -Long story short: Head over to your stable diffusion model of choice, *e.g.* [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4), read through the license and click-accept to get -access to the model. +Please, head over to your stable diffusion model of choice, *e.g.* [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree. You have to be a registered user in ๐Ÿค— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). Having "click-accepted" the license, you can save your token: @@ -77,13 +76,13 @@ Having "click-accepted" the license, you can save your token: AUTH_TOKEN = "" ``` -You can then load [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) +You can then load [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) just like we did before only that now you need to pass your `AUTH_TOKEN`: ```python >>> from diffusers import DiffusionPipeline ->>> generator = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=AUTH_TOKEN) +>>> generator = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_auth_token=AUTH_TOKEN) ``` If you do not pass your authentication token you will see that the diffusion system will not be correctly @@ -95,15 +94,15 @@ the weights locally via: ``` git lfs install -git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 +git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 ``` and then load locally saved weights into the pipeline. This way, you do not need to pass an authentication -token. Assuming that `"./stable-diffusion-v1-4"` is the local path to the cloned stable-diffusion-v1-4 repo, +token. Assuming that `"./stable-diffusion-v1-5"` is the local path to the cloned stable-diffusion-v1-5 repo, you can also load the pipeline as follows: ```python ->>> generator = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-4") +>>> generator = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") ``` Running the pipeline is then identical to the code above as it's the same model architecture. @@ -125,7 +124,7 @@ you could use it as follows: >>> scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear") >>> generator = StableDiffusionPipeline.from_pretrained( -... "CompVis/stable-diffusion-v1-4", scheduler=scheduler, use_auth_token=AUTH_TOKEN +... "runwayml/stable-diffusion-v1-5", scheduler=scheduler, use_auth_token=AUTH_TOKEN ... ) ``` diff --git a/docs/source/training/text_inversion.mdx b/docs/source/training/text_inversion.mdx index 13ea7c94..d44cef30 100644 --- a/docs/source/training/text_inversion.mdx +++ b/docs/source/training/text_inversion.mdx @@ -64,7 +64,7 @@ accelerate config ### Cat toy example -You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. +You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You have to be a registered user in ๐Ÿค— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). @@ -83,7 +83,7 @@ Now let's get our dataset.Download 3-4 images from [here](https://drive.google.c And launch the training using ```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" +export MODEL_NAME="runwayml/stable-diffusion-v1-5" export DATA_DIR="path-to-dir-containing-images" accelerate launch textual_inversion.py \ diff --git a/docs/source/using-diffusers/custom_pipelines.mdx b/docs/source/using-diffusers/custom_pipelines.mdx index b52d4055..af973660 100644 --- a/docs/source/using-diffusers/custom_pipelines.mdx +++ b/docs/source/using-diffusers/custom_pipelines.mdx @@ -58,7 +58,7 @@ feature_extractor = CLIPFeatureExtractor.from_pretrained(clip_model_id) clip_model = CLIPModel.from_pretrained(clip_model_id) pipeline = DiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", custom_pipeline="clip_guided_stable_diffusion", clip_model=clip_model, feature_extractor=feature_extractor, diff --git a/docs/source/using-diffusers/img2img.mdx b/docs/source/using-diffusers/img2img.mdx index 62eaeea9..defefffe 100644 --- a/docs/source/using-diffusers/img2img.mdx +++ b/docs/source/using-diffusers/img2img.mdx @@ -25,7 +25,7 @@ from diffusers import StableDiffusionImg2ImgPipeline # load the pipeline device = "cuda" pipe = StableDiffusionImg2ImgPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16 + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16 ).to(device) # let's download an initial image diff --git a/examples/community/README.md b/examples/community/README.md index a5b2252d..7ecaffb3 100644 --- a/examples/community/README.md +++ b/examples/community/README.md @@ -18,7 +18,7 @@ If a community doesn't work as expected, please open an issue and ping the autho To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly. ```py -pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="filename_in_the_community_folder") +pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline="filename_in_the_community_folder") ``` ## Example usages @@ -41,7 +41,7 @@ clip_model = CLIPModel.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K", guided_pipeline = DiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", custom_pipeline="clip_guided_stable_diffusion", clip_model=clip_model, feature_extractor=feature_extractor, @@ -141,7 +141,7 @@ def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB") -pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="stable_diffusion_mega", torch_dtype=torch.float16, revision="fp16") +pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline="stable_diffusion_mega", torch_dtype=torch.float16, revision="fp16") pipe.to("cuda") pipe.enable_attention_slicing() diff --git a/examples/community/stable_diffusion_mega.py b/examples/community/stable_diffusion_mega.py index b253370f..72395194 100644 --- a/examples/community/stable_diffusion_mega.py +++ b/examples/community/stable_diffusion_mega.py @@ -46,7 +46,7 @@ class StableDiffusionMegaPipeline(DiffusionPipeline): [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`]. safety_checker ([`StableDiffusionMegaSafetyChecker`]): Classification module that estimates whether generated images could be considered offensive or harmful. - Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. + Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details. feature_extractor ([`CLIPFeatureExtractor`]): Model that extracts features from generated images to be used as inputs for the `safety_checker`. """ diff --git a/examples/test_examples.py b/examples/test_examples.py index 15c8e05a..eb86b18b 100644 --- a/examples/test_examples.py +++ b/examples/test_examples.py @@ -101,7 +101,7 @@ class ExamplesTestsAccelerate(unittest.TestCase): with tempfile.TemporaryDirectory() as tmpdir: test_args = f""" examples/textual_inversion/textual_inversion.py - --pretrained_model_name_or_path CompVis/stable-diffusion-v1-4 + --pretrained_model_name_or_path runwayml/stable-diffusion-v1-5 --train_data_dir docs/source/imgs --learnable_property object --placeholder_token diff --git a/examples/textual_inversion/README.md b/examples/textual_inversion/README.md index 05d8ffb8..801c42c8 100644 --- a/examples/textual_inversion/README.md +++ b/examples/textual_inversion/README.md @@ -48,7 +48,7 @@ Now let's get our dataset.Download 3-4 images from [here](https://drive.google.c And launch the training using ```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" +export MODEL_NAME="runwayml/stable-diffusion-v1-5" export DATA_DIR="path-to-dir-containing-images" accelerate launch textual_inversion.py \ diff --git a/src/diffusers/modeling_flax_utils.py b/src/diffusers/modeling_flax_utils.py index e31a9b7b..5ef10022 100644 --- a/src/diffusers/modeling_flax_utils.py +++ b/src/diffusers/modeling_flax_utils.py @@ -105,14 +105,14 @@ class FlaxModelMixin: >>> from diffusers import FlaxUNet2DConditionModel >>> # load model - >>> model, params = FlaxUNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4") + >>> model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5") >>> # By default, the model parameters will be in fp32 precision, to cast these to bfloat16 precision >>> params = model.to_bf16(params) >>> # If you don't want to cast certain parameters (for example layer norm bias and scale) >>> # then pass the mask as follows >>> from flax import traverse_util - >>> model, params = FlaxUNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4") + >>> model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5") >>> flat_params = traverse_util.flatten_dict(params) >>> mask = { ... path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale")) @@ -141,7 +141,7 @@ class FlaxModelMixin: >>> from diffusers import FlaxUNet2DConditionModel >>> # Download model and configuration from huggingface.co - >>> model, params = FlaxUNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4") + >>> model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5") >>> # By default, the model params will be in fp32, to illustrate the use of this method, >>> # we'll first cast to fp16 and back to fp32 >>> params = model.to_f16(params) @@ -171,14 +171,14 @@ class FlaxModelMixin: >>> from diffusers import FlaxUNet2DConditionModel >>> # load model - >>> model, params = FlaxUNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4") + >>> model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5") >>> # By default, the model params will be in fp32, to cast these to float16 >>> params = model.to_fp16(params) >>> # If you want don't want to cast certain parameters (for example layer norm bias and scale) >>> # then pass the mask as follows >>> from flax import traverse_util - >>> model, params = FlaxUNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4") + >>> model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5") >>> flat_params = traverse_util.flatten_dict(params) >>> mask = { ... path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale")) @@ -216,7 +216,7 @@ class FlaxModelMixin: - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. Valid model ids are namespaced under a user or organization name, like - `CompVis/stable-diffusion-v1-4`. + `runwayml/stable-diffusion-v1-5`. - A path to a *directory* containing model weights saved using [`~ModelMixin.save_pretrained`], e.g., `./my_model_directory/`. dtype (`jax.numpy.dtype`, *optional*, defaults to `jax.numpy.float32`): @@ -273,7 +273,7 @@ class FlaxModelMixin: >>> from diffusers import FlaxUNet2DConditionModel >>> # Download model and configuration from huggingface.co and cache. - >>> model, params = FlaxUNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4") + >>> model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5") >>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable). >>> model, params = FlaxUNet2DConditionModel.from_pretrained("./test/saved_model/") ```""" diff --git a/src/diffusers/pipeline_flax_utils.py b/src/diffusers/pipeline_flax_utils.py index d55338b5..3c23693b 100644 --- a/src/diffusers/pipeline_flax_utils.py +++ b/src/diffusers/pipeline_flax_utils.py @@ -244,7 +244,7 @@ class FlaxDiffusionPipeline(ConfigMixin): It is required to be logged in (`huggingface-cli login`) when you want to use private or [gated - models](https://huggingface.co/docs/hub/models-gated#gated-models), *e.g.* `"CompVis/stable-diffusion-v1-4"` + models](https://huggingface.co/docs/hub/models-gated#gated-models), *e.g.* `"runwayml/stable-diffusion-v1-5"` @@ -266,13 +266,13 @@ class FlaxDiffusionPipeline(ConfigMixin): >>> # Download pipeline that requires an authorization token >>> # For more information on access tokens, please refer to this section >>> # of the documentation](https://huggingface.co/docs/hub/security-tokens) - >>> pipeline = FlaxDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") + >>> pipeline = FlaxDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") >>> # Download pipeline, but overwrite scheduler >>> from diffusers import LMSDiscreteScheduler >>> scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear") - >>> pipeline = FlaxDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler) + >>> pipeline = FlaxDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", scheduler=scheduler) ``` """ cache_dir = kwargs.pop("cache_dir", DIFFUSERS_CACHE) diff --git a/src/diffusers/pipeline_utils.py b/src/diffusers/pipeline_utils.py index d1e03749..89450565 100644 --- a/src/diffusers/pipeline_utils.py +++ b/src/diffusers/pipeline_utils.py @@ -317,7 +317,7 @@ class DiffusionPipeline(ConfigMixin): It is required to be logged in (`huggingface-cli login`) when you want to use private or [gated - models](https://huggingface.co/docs/hub/models-gated#gated-models), *e.g.* `"CompVis/stable-diffusion-v1-4"` + models](https://huggingface.co/docs/hub/models-gated#gated-models), *e.g.* `"runwayml/stable-diffusion-v1-5"` @@ -339,13 +339,13 @@ class DiffusionPipeline(ConfigMixin): >>> # Download pipeline that requires an authorization token >>> # For more information on access tokens, please refer to this section >>> # of the documentation](https://huggingface.co/docs/hub/security-tokens) - >>> pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") + >>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") >>> # Download pipeline, but overwrite scheduler >>> from diffusers import LMSDiscreteScheduler >>> scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear") - >>> pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler) + >>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", scheduler=scheduler) ``` """ cache_dir = kwargs.pop("cache_dir", DIFFUSERS_CACHE) diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index 90752d69..66d3a881 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -55,8 +55,8 @@ Diffusion models often consist of multiple independently-trained models or other Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one. During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality: -- [`from_pretrained` method](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L139) that accepts a Hugging Face Hub repository id, *e.g.* [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) or a path to a local directory, *e.g.* -"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [CompVis/stable-diffusion-v1-4/model_index.json](https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/model_index.json), which defines all components that should be +- [`from_pretrained` method](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L139) that accepts a Hugging Face Hub repository id, *e.g.* [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or a path to a local directory, *e.g.* +"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [runwayml/stable-diffusion-v1-5/model_index.json](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), which defines all components that should be loaded into the pipelines. More specifically, for each model/component one needs to define the format `: ["", ""]`. `` is the attribute name given to the loaded instance of `` which can be found in the library or pipeline folder called `""`. - [`save_pretrained`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L90) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated @@ -88,7 +88,7 @@ logic including pre-processing, an unrolled diffusion loop, and post-processing # make sure you're logged in with `huggingface-cli login` from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler -pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") +pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" @@ -111,7 +111,7 @@ from diffusers import StableDiffusionImg2ImgPipeline # load the pipeline device = "cuda" pipe = StableDiffusionImg2ImgPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, ).to(device) diff --git a/src/diffusers/pipelines/stable_diffusion/README.md b/src/diffusers/pipelines/stable_diffusion/README.md index 47c38acb..9c92e7b1 100644 --- a/src/diffusers/pipelines/stable_diffusion/README.md +++ b/src/diffusers/pipelines/stable_diffusion/README.md @@ -13,7 +13,7 @@ The summary of the model is the following: - Stable Diffusion has the same architecture as [Latent Diffusion](https://arxiv.org/abs/2112.10752) but uses a frozen CLIP Text Encoder instead of training the text encoder jointly with the diffusion model. - An in-detail explanation of the Stable Diffusion model can be found under [Stable Diffusion with ๐Ÿงจ Diffusers](https://huggingface.co/blog/stable_diffusion). - If you don't want to rely on the Hugging Face Hub and having to pass a authentication token, you can -download the weights with `git lfs install; git clone https://huggingface.co/CompVis/stable-diffusion-v1-4` and instead pass the local path to the cloned folder to `from_pretrained` as shown below. +download the weights with `git lfs install; git clone https://huggingface.co/runwayml/stable-diffusion-v1-5` and instead pass the local path to the cloned folder to `from_pretrained` as shown below. - Stable Diffusion can work with a variety of different samplers as is shown below. ## Available Pipelines: @@ -33,14 +33,14 @@ If you want to download the model weights using a single Python line, you need t ```python from diffusers import DiffusionPipeline -pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") +pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") ``` -This however can make it difficult to build applications on top of `diffusers` as you will always have to pass the token around. A potential way to solve this issue is by downloading the weights to a local path `"./stable-diffusion-v1-4"`: +This however can make it difficult to build applications on top of `diffusers` as you will always have to pass the token around. A potential way to solve this issue is by downloading the weights to a local path `"./stable-diffusion-v1-5"`: ``` git lfs install -git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 +git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 ``` and simply passing the local path to `from_pretrained`: @@ -48,7 +48,7 @@ and simply passing the local path to `from_pretrained`: ```python from diffusers import StableDiffusionPipeline -pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4") +pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") ``` ### Text-to-Image with default PLMS scheduler @@ -57,7 +57,7 @@ pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4") # make sure you're logged in with `huggingface-cli login` from diffusers import StableDiffusionPipeline -pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") +pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" @@ -75,7 +75,7 @@ from diffusers import StableDiffusionPipeline, DDIMScheduler scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False) pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", scheduler=scheduler, ).to("cuda") @@ -98,7 +98,7 @@ lms = LMSDiscreteScheduler( ) pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", + "runwayml/stable-diffusion-v1-5", scheduler=lms, ).to("cuda") diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_flax_stable_diffusion.py b/src/diffusers/pipelines/stable_diffusion/pipeline_flax_stable_diffusion.py index 18c008f8..e4f56d94 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_flax_stable_diffusion.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_flax_stable_diffusion.py @@ -45,7 +45,7 @@ class FlaxStableDiffusionPipeline(FlaxDiffusionPipeline): [`FlaxDDIMScheduler`], [`FlaxLMSDiscreteScheduler`], or [`FlaxPNDMScheduler`]. safety_checker ([`FlaxStableDiffusionSafetyChecker`]): Classification module that estimates whether generated images could be considered offensive or harmful. - Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. + Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details. feature_extractor ([`CLIPFeatureExtractor`]): Model that extracts features from generated images to be used as inputs for the `safety_checker`. """ diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_img2img.py b/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_img2img.py index 89cc054a..ce3f3fba 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_img2img.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_img2img.py @@ -50,7 +50,7 @@ class OnnxStableDiffusionImg2ImgPipeline(DiffusionPipeline): [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`]. safety_checker ([`StableDiffusionSafetyChecker`]): Classification module that estimates whether generated images could be considered offensive or harmful. - Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. + Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details. feature_extractor ([`CLIPFeatureExtractor`]): Model that extracts features from generated images to be used as inputs for the `safety_checker`. """ diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_inpaint.py b/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_inpaint.py index 30f8d7fc..b45d968f 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_inpaint.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_inpaint.py @@ -63,7 +63,7 @@ class OnnxStableDiffusionInpaintPipeline(DiffusionPipeline): [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`]. safety_checker ([`StableDiffusionSafetyChecker`]): Classification module that estimates whether generated images could be considered offensive or harmful. - Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. + Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details. feature_extractor ([`CLIPFeatureExtractor`]): Model that extracts features from generated images to be used as inputs for the `safety_checker`. """ diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py index 8ae51999..46def26e 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py @@ -40,7 +40,7 @@ class StableDiffusionPipeline(DiffusionPipeline): [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`]. safety_checker ([`StableDiffusionSafetyChecker`]): Classification module that estimates whether generated images could be considered offensive or harmful. - Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. + Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details. feature_extractor ([`CLIPFeatureExtractor`]): Model that extracts features from generated images to be used as inputs for the `safety_checker`. """ diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py index 799fd459..db813483 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py @@ -52,7 +52,7 @@ class StableDiffusionImg2ImgPipeline(DiffusionPipeline): [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`]. safety_checker ([`StableDiffusionSafetyChecker`]): Classification module that estimates whether generated images could be considered offensive or harmful. - Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. + Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details. feature_extractor ([`CLIPFeatureExtractor`]): Model that extracts features from generated images to be used as inputs for the `safety_checker`. """ diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py index abcc7fba..918a1241 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py @@ -59,7 +59,7 @@ class StableDiffusionInpaintPipeline(DiffusionPipeline): [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`]. safety_checker ([`StableDiffusionSafetyChecker`]): Classification module that estimates whether generated images could be considered offensive or harmful. - Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. + Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details. feature_extractor ([`CLIPFeatureExtractor`]): Model that extracts features from generated images to be used as inputs for the `safety_checker`. """ diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint_legacy.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint_legacy.py index 038178dd..71a2be62 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint_legacy.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint_legacy.py @@ -66,7 +66,7 @@ class StableDiffusionInpaintPipelineLegacy(DiffusionPipeline): [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`]. safety_checker ([`StableDiffusionSafetyChecker`]): Classification module that estimates whether generated images could be considered offensive or harmful. - Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. + Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details. feature_extractor ([`CLIPFeatureExtractor`]): Model that extracts features from generated images to be used as inputs for the `safety_checker`. """