Docs: recommend xformers (#1724)

* Fix links to flash attention.

* Add xformers installation instructions.

* Make link to xformers install more prominent.

* Link to xformers install from training docs.
This commit is contained in:
Pedro Cuenca 2022-12-16 15:49:01 +01:00 committed by GitHub
parent c6d0dff4a3
commit acd317810b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 39 additions and 4 deletions

View File

@ -45,6 +45,8 @@
- sections:
- local: optimization/fp16
title: "Memory and Speed"
- local: optimization/xformers
title: "xFormers"
- local: optimization/onnx
title: "ONNX"
- local: optimization/open_vino

View File

@ -12,7 +12,9 @@ specific language governing permissions and limitations under the License.
# Memory and speed
We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed.
We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. As a general rule, we recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for memory efficient attention, please see the recommended [installation instructions](xformers).
We'll discuss how the following settings impact performance and memory.
| | Latency | Speedup |
| ---------------- | ------- | ------- |
@ -322,7 +324,9 @@ with torch.inference_mode():
## Memory Efficient Attention
Recent work on optimizing the bandwitdh in the attention block have generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention (from @tridao, [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf)) .
Recent work on optimizing the bandwitdh in the attention block has generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention from @tridao: [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf).
Here are the speedups we obtain on a few Nvidia GPUs when running the inference at 512x512 with a batch size of 1 (one prompt):
| GPU | Base Attention FP16 | Memory Efficient Attention FP16 |
@ -338,7 +342,7 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference
To leverage it just make sure you have:
- PyTorch > 1.12
- Cuda available
- Installed the [xformers](https://github.com/facebookresearch/xformers) library
- [Installed the xformers library](xformers).
```python
from diffusers import StableDiffusionPipeline
import torch

View File

@ -0,0 +1,26 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Installing xFormers
We recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for both inference and training. In our tests, the optimizations performed in the attention blocks allow for both faster speed and reduced memory consumption.
Installing xFormers has historically been a bit involved, as binary distributions were not always up to date. Fortunately, the project has [very recently](https://github.com/facebookresearch/xformers/pull/591) integrated a process to build pip wheels as part of the project's continuous integration, so this should improve a lot starting from xFormers version 0.0.16.
Until xFormers 0.0.16 is deployed, you can install pip wheels using [`TestPyPI`](https://test.pypi.org/project/formers/). These are the steps that worked for us in a Linux computer to install xFormers version 0.0.15:
```bash
pip install pyre-extensions==0.0.23
pip install -i https://test.pypi.org/simple/ formers==0.0.15.dev376
```
We'll update these instructions when the wheels are published to the official PyPI repository.

View File

@ -36,7 +36,9 @@ pip install git+https://github.com/huggingface/diffusers
pip install -U -r diffusers/examples/dreambooth/requirements.txt
```
Then initialize and configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
xFormers is not part of the training requirements, but [we recommend you install it if you can](../optimization/xformers). It could make your training faster and less memory intensive.
After all dependencies have been set up you can configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
```bash
accelerate config

View File

@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
- [Text Inversion](./text_inversion)
- [Dreambooth](./dreambooth)
If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive.
| Task | 🤗 Accelerate | 🤗 Datasets | Colab
|---|---|:---:|:---:|