diffusers/docs/source/optimization/mps.mdx

<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# How to use Stable Diffusion in Apple Silicon (M1/M2)

🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch `mps` device. These are the steps you need to follow to use your M1 or M2 computer with Stable Diffusion.

## Requirements

- Mac computer with Apple silicon (M1/M2) hardware.
- macOS 12.6 or later (13.0 or later recommended).
- arm64 version of Python.
- PyTorch 1.13. You can install it with `pip` or `conda` using the instructions in https://pytorch.org/get-started/locally/.


## Inference Pipeline

The snippet below demonstrates how to use the `mps` backend using the familiar `to()` interface to move the Stable Diffusion pipeline to your M1 or M2 device.

We recommend to "prime" the pipeline using an additional one-time pass through it. This is a temporary workaround for a weird issue we have detected: the first inference pass produces slightly different results than subsequent ones. You only need to do this pass once, and it's ok to use just one inference step and discard the result.

```python
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("mps")

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

prompt = "a photo of an astronaut riding a horse on mars"

# First-time "warmup" pass (see explanation above)
_ = pipe(prompt, num_inference_steps=1)

# Results match those from the CPU device after the warmup pass.
image = pipe(prompt).images[0]
```

## Performance Recommendations

M1/M2 performance is very sensitive to memory pressure. The system will automatically swap if it needs to, but performance will degrade significantly when it does.

We recommend you use _attention slicing_ to reduce memory pressure during inference and prevent swapping, particularly if your computer has lass than 64 GB of system RAM, or if you generate images at non-standard resolutions larger than 512 × 512 pixels. Attention slicing performs the costly attention operation in multiple steps instead of all at once. It usually has a performance impact of ~20% in computers without universal memory, but we have observed _better performance_ in most Apple Silicon computers, unless you have 64 GB or more.

```python
pipeline.enable_attention_slicing()
```

## Known Issues

- As mentioned above, we are investigating a strange [first-time inference issue](https://github.com/huggingface/diffusers/issues/372).
- Generating multiple prompts in a batch [crashes or doesn't work reliably](https://github.com/huggingface/diffusers/issues/363). We believe this is related to the [`mps` backend in PyTorch](https://github.com/pytorch/pytorch/issues/84039). This is being resolved, but for now we recommend to iterate instead of batching.
-												Docs (#45)

* first pass at docs structure

* minor reformatting, add github actions for docs

* populate docs (primarily from README, some writing)
											
										
										
											2022-07-13 09:42:05 -06:00
+								<!--Copyright 2022 The HuggingFace Team. All rights reserved.
 								Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
 								the License. You may obtain a copy of the License at
 								http://www.apache.org/licenses/LICENSE-2.0
 								Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
 								an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
 								specific language governing permissions and limitations under the License.
 								-->
-												Docs: optimization / special hardware (#390)

Add mps documentation.
											
										
										
											2022-09-07 08:27:14 -06:00
+								# How to use Stable Diffusion in Apple Silicon (M1/M2)
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												Docs: optimization / special hardware (#390)

Add mps documentation.
											
										
										
											2022-09-07 08:27:14 -06:00
+								🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch `mps` device. These are the steps you need to follow to use your M1 or M2 computer with Stable Diffusion.
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												Docs: optimization / special hardware (#390)

Add mps documentation.
											
										
										
											2022-09-07 08:27:14 -06:00
+								## Requirements
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												Docs: optimization / special hardware (#390)

Add mps documentation.
											
										
										
											2022-09-07 08:27:14 -06:00
+								- Mac computer with Apple silicon (M1/M2) hardware.
-												mps changes for PyTorch 1.13 (#926)

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
											
										
										
											2022-10-25 08:41:51 -06:00
+								- macOS 12.6 or later (13.0 or later recommended).
-												Docs: optimization / special hardware (#390)

Add mps documentation.
											
										
										
											2022-09-07 08:27:14 -06:00
+								- arm64 version of Python.
-												Docs: Do not require PyTorch nightlies (#1123)

Do not require PyTorch nightlies.
											
										
										
											2022-11-03 11:17:23 -06:00
+								- PyTorch 1.13. You can install it with `pip` or `conda` using the instructions in https://pytorch.org/get-started/locally/.
-												mps changes for PyTorch 1.13 (#926)

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
											
										
										
											2022-10-25 08:41:51 -06:00
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												Docs: optimization / special hardware (#390)

Add mps documentation.
											
										
										
											2022-09-07 08:27:14 -06:00
+								## Inference Pipeline
 								The snippet below demonstrates how to use the `mps` backend using the familiar `to()` interface to move the Stable Diffusion pipeline to your M1 or M2 device.
 								We recommend to "prime" the pipeline using an additional one-time pass through it. This is a temporary workaround for a weird issue we have detected: the first inference pass produces slightly different results than subsequent ones. You only need to do this pass once, and it's ok to use just one inference step and discard the result.
 								```python
 								# make sure you're logged in with `huggingface-cli login`
 								from diffusers import StableDiffusionPipeline
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												v1-5 docs updates (#921)

* Update README.md

Additionally add FLAX so the model card can be slimmer and point to this page

* Find and replace all

* v-1-5 -> v1-5

* revert test changes

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/quicktour.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update README.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/quicktour.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update README.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Revert certain references to v1-5

* Docs changes

* Apply suggestions from code review

Co-authored-by: apolinario <joaopaulo.passos+multimodal@gmail.com>
Co-authored-by: anton-l <anton@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
											
										
										
											2022-10-24 14:50:23 -06:00
+								pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
-												Docs: optimization / special hardware (#390)

Add mps documentation.
											
										
										
											2022-09-07 08:27:14 -06:00
+								pipe = pipe.to("mps")
-												mps changes for PyTorch 1.13 (#926)

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
											
										
										
											2022-10-25 08:41:51 -06:00
+								# Recommended if your computer has < 64 GB of RAM
 								pipe.enable_attention_slicing()
-												Docs: optimization / special hardware (#390)

Add mps documentation.
											
										
										
											2022-09-07 08:27:14 -06:00
+								prompt = "a photo of an astronaut riding a horse on mars"
 								# First-time "warmup" pass (see explanation above)
 								_ = pipe(prompt, num_inference_steps=1)
 								# Results match those from the CPU device after the warmup pass.
 								image = pipe(prompt).images[0]
 								```
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												mps changes for PyTorch 1.13 (#926)

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
											
										
										
											2022-10-25 08:41:51 -06:00
+								## Performance Recommendations
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												mps changes for PyTorch 1.13 (#926)

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
											
										
										
											2022-10-25 08:41:51 -06:00
+								M1/M2 performance is very sensitive to memory pressure. The system will automatically swap if it needs to, but performance will degrade significantly when it does.
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												mps changes for PyTorch 1.13 (#926)

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
											
										
										
											2022-10-25 08:41:51 -06:00
+								We recommend you use _attention slicing_ to reduce memory pressure during inference and prevent swapping, particularly if your computer has lass than 64 GB of system RAM, or if you generate images at non-standard resolutions larger than 512 × 512 pixels. Attention slicing performs the costly attention operation in multiple steps instead of all at once. It usually has a performance impact of ~20% in computers without universal memory, but we have observed _better performance_ in most Apple Silicon computers, unless you have 64 GB or more.
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												mps changes for PyTorch 1.13 (#926)

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
											
										
										
											2022-10-25 08:41:51 -06:00
+								```python
 								pipeline.enable_attention_slicing()
 								```
-												[Docs] Let's go (#385)


											
										
										
											2022-09-07 03:31:13 -06:00
-												mps changes for PyTorch 1.13 (#926)

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
											
										
										
											2022-10-25 08:41:51 -06:00
+								## Known Issues
 								- As mentioned above, we are investigating a strange [first-time inference issue](https://github.com/huggingface/diffusers/issues/372).
-												Docs: Do not require PyTorch nightlies (#1123)

Do not require PyTorch nightlies.
											
										
										
											2022-11-03 11:17:23 -06:00
+								- Generating multiple prompts in a batch [crashes or doesn't work reliably](https://github.com/huggingface/diffusers/issues/363). We believe this is related to the [`mps` backend in PyTorch](https://github.com/pytorch/pytorch/issues/84039). This is being resolved, but for now we recommend to iterate instead of batching.