From acd317810bc138b3a78fa30e1b3da1006c1d60ad Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Fri, 16 Dec 2022 15:49:01 +0100 Subject: [PATCH] Docs: recommend xformers (#1724) * Fix links to flash attention. * Add xformers installation instructions. * Make link to xformers install more prominent. * Link to xformers install from training docs. --- docs/source/_toctree.yml | 2 ++ docs/source/optimization/fp16.mdx | 10 +++++++--- docs/source/optimization/xformers.mdx | 26 ++++++++++++++++++++++++++ docs/source/training/dreambooth.mdx | 4 +++- docs/source/training/overview.mdx | 1 + 5 files changed, 39 insertions(+), 4 deletions(-) create mode 100644 docs/source/optimization/xformers.mdx diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index b75c658e..ec578e17 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -45,6 +45,8 @@ - sections: - local: optimization/fp16 title: "Memory and Speed" + - local: optimization/xformers + title: "xFormers" - local: optimization/onnx title: "ONNX" - local: optimization/open_vino diff --git a/docs/source/optimization/fp16.mdx b/docs/source/optimization/fp16.mdx index 49fe3876..55180531 100644 --- a/docs/source/optimization/fp16.mdx +++ b/docs/source/optimization/fp16.mdx @@ -12,7 +12,9 @@ specific language governing permissions and limitations under the License. # Memory and speed -We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. +We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. As a general rule, we recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for memory efficient attention, please see the recommended [installation instructions](xformers). + +We'll discuss how the following settings impact performance and memory. | | Latency | Speedup | | ---------------- | ------- | ------- | @@ -322,7 +324,9 @@ with torch.inference_mode(): ## Memory Efficient Attention -Recent work on optimizing the bandwitdh in the attention block have generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention (from @tridao, [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf)) . + +Recent work on optimizing the bandwitdh in the attention block has generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention from @tridao: [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf). + Here are the speedups we obtain on a few Nvidia GPUs when running the inference at 512x512 with a batch size of 1 (one prompt): | GPU | Base Attention FP16 | Memory Efficient Attention FP16 | @@ -338,7 +342,7 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference To leverage it just make sure you have: - PyTorch > 1.12 - Cuda available - - Installed the [xformers](https://github.com/facebookresearch/xformers) library + - [Installed the xformers library](xformers). ```python from diffusers import StableDiffusionPipeline import torch diff --git a/docs/source/optimization/xformers.mdx b/docs/source/optimization/xformers.mdx new file mode 100644 index 00000000..93bfccb9 --- /dev/null +++ b/docs/source/optimization/xformers.mdx @@ -0,0 +1,26 @@ + + +# Installing xFormers + +We recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for both inference and training. In our tests, the optimizations performed in the attention blocks allow for both faster speed and reduced memory consumption. + +Installing xFormers has historically been a bit involved, as binary distributions were not always up to date. Fortunately, the project has [very recently](https://github.com/facebookresearch/xformers/pull/591) integrated a process to build pip wheels as part of the project's continuous integration, so this should improve a lot starting from xFormers version 0.0.16. + +Until xFormers 0.0.16 is deployed, you can install pip wheels using [`TestPyPI`](https://test.pypi.org/project/formers/). These are the steps that worked for us in a Linux computer to install xFormers version 0.0.15: + +```bash +pip install pyre-extensions==0.0.23 +pip install -i https://test.pypi.org/simple/ formers==0.0.15.dev376 +``` + +We'll update these instructions when the wheels are published to the official PyPI repository. diff --git a/docs/source/training/dreambooth.mdx b/docs/source/training/dreambooth.mdx index 6cea3fb7..f8d7a025 100644 --- a/docs/source/training/dreambooth.mdx +++ b/docs/source/training/dreambooth.mdx @@ -36,7 +36,9 @@ pip install git+https://github.com/huggingface/diffusers pip install -U -r diffusers/examples/dreambooth/requirements.txt ``` -Then initialize and configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with: +xFormers is not part of the training requirements, but [we recommend you install it if you can](../optimization/xformers). It could make your training faster and less memory intensive. + +After all dependencies have been set up you can configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with: ```bash accelerate config diff --git a/docs/source/training/overview.mdx b/docs/source/training/overview.mdx index 9b36117c..fd6ec184 100644 --- a/docs/source/training/overview.mdx +++ b/docs/source/training/overview.mdx @@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie - [Text Inversion](./text_inversion) - [Dreambooth](./dreambooth) +If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive. | Task | 🤗 Accelerate | 🤗 Datasets | Colab |---|---|:---:|:---:|