Commit Graph

65 Commits

Author SHA1 Message Date
brkirch 87dd685224 Make sub-quadratic the default for MPS 2023-08-13 10:06:25 -04:00
brkirch abfa4ad8bc Use fixed size for sub-quadratic chunking on MPS
Even if this causes chunks to be much smaller, performance isn't significantly impacted. This will usually reduce memory usage but should also help with poor performance when free memory is low.
2023-08-13 10:06:25 -04:00
AUTOMATIC1111 10ff071e33 update doggettx cross attention optimization to not use an unreasonable amount of memory in some edge cases -- suggestion by MorkTheOrk 2023-08-02 18:37:16 +03:00
AUTOMATIC1111 ac4ccfa136 get attention optimizations to work 2023-07-13 09:30:33 +03:00
AUTOMATIC1111 da464a3fb3 SDXL support 2023-07-12 23:52:43 +03:00
AUTOMATIC1111 806ea639e6
Merge pull request #11066 from aljungberg/patch-1
Fix upcast attention dtype error.
2023-06-07 07:48:52 +03:00
Alexander Ljungberg d9cc0910c8
Fix upcast attention dtype error.
Without this fix, enabling the "Upcast cross attention layer to float32" option while also using `--opt-sdp-attention` breaks generation with an error:

```
  File "/ext3/automatic1111/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 612, in sdp_attnblock_forward
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False)
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
```

The fix is to make sure to upcast the value tensor too.
2023-06-06 21:45:30 +01:00
AUTOMATIC1111 56bf522913
Merge pull request #10990 from vkage/sd_hijack_optimizations_bugfix
torch.cuda.is_available() check for SdOptimizationXformers
2023-06-04 11:34:32 +03:00
AUTOMATIC 2e23c9c568 fix the broken line for #10990 2023-06-04 11:33:51 +03:00
Vivek K. Vasishtha b1a72bc7e2
torch.cuda.is_available() check for SdOptimizationXformers 2023-06-03 21:54:27 +05:30
AUTOMATIC 3ee1238630 revert default cross attention optimization to Doggettx
make --disable-opt-split-attention command line option work again
2023-06-01 08:12:21 +03:00
AUTOMATIC 36888092af revert default cross attention optimization to Doggettx
make --disable-opt-split-attention command line option work again
2023-06-01 08:12:06 +03:00
AUTOMATIC 05933840f0 rename print_error to report, use it with together with package name 2023-05-31 19:56:37 +03:00
Aarni Koskela 00dfe27f59 Add & use modules.errors.print_error where currently printing exception info by hand 2023-05-29 09:17:30 +03:00
Aarni Koskela df004be2fc Add a couple `from __future__ import annotations`es for Py3.9 compat 2023-05-21 00:26:16 +03:00
AUTOMATIC1111 1e5afd4fa9
Apply suggestions from code review
Co-authored-by: Aarni Koskela <akx@iki.fi>
2023-05-19 09:17:36 +03:00
AUTOMATIC 8a3d232839 fix linter issues 2023-05-19 00:03:27 +03:00
AUTOMATIC 2582a0fd3b make it possible for scripts to add cross attention optimizations
add UI selection for cross attention optimization
2023-05-18 22:48:28 +03:00
Aarni Koskela 49a55b410b Autofix Ruff W (not W605) (mostly whitespace) 2023-05-11 20:29:11 +03:00
AUTOMATIC 028d3f6425 ruff auto fixes 2023-05-10 11:05:02 +03:00
AUTOMATIC 762265eab5 autofixes from ruff 2023-05-10 07:52:45 +03:00
brkirch 7aab389d6f Fix for Unet NaNs 2023-05-08 08:16:56 -04:00
FNSpd 280ed8f00f
Update sd_hijack_optimizations.py 2023-03-24 16:29:16 +04:00
FNSpd c84c9df737
Update sd_hijack_optimizations.py 2023-03-21 14:50:22 +04:00
Pam 8d7fa2f67c sdp_attnblock_forward hijack 2023-03-10 22:48:41 +05:00
Pam 37acba2633 argument to disable memory efficient for sdp 2023-03-10 12:19:36 +05:00
Pam fec0a89511 scaled dot product attention 2023-03-07 00:33:13 +05:00
brkirch e3b53fd295 Add UI setting for upcasting attention to float32
Adds "Upcast cross attention layer to float32" option in Stable Diffusion settings. This allows for generating images using SD 2.1 models without --no-half or xFormers.

In order to make upcasting cross attention layer optimizations possible it is necessary to indent several sections of code in sd_hijack_optimizations.py so that a context manager can be used to disable autocast. Also, even though Stable Diffusion (and Diffusers) only upcast q and k, unfortunately my findings were that most of the cross attention layer optimizations could not function unless v is upcast also.
2023-01-25 01:13:04 -05:00
AUTOMATIC 59146621e2 better support for xformers flash attention on older versions of torch 2023-01-23 16:40:20 +03:00
Takuma Mori 3262e825cc add --xformers-flash-attention option & impl 2023-01-21 17:42:04 +09:00
AUTOMATIC 40ff6db532 extra networks UI
rework of hypernets: rather than via settings, hypernets are added directly to prompt as <hypernet:name:weight>
2023-01-21 08:36:07 +03:00
brkirch c18add68ef Added license 2023-01-06 16:42:47 -05:00
brkirch b95a4c0ce5 Change sub-quad chunk threshold to use percentage 2023-01-06 01:01:51 -05:00
brkirch d782a95967 Add Birch-san's sub-quadratic attention implementation 2023-01-06 00:14:13 -05:00
brkirch 35b1775b32 Use other MPS optimization for large q.shape[0] * q.shape[1]
Check if q.shape[0] * q.shape[1] is 2**18 or larger and use the lower memory usage MPS optimization if it is. This should prevent most crashes that were occurring at certain resolutions (e.g. 1024x1024, 2048x512, 512x2048).

Also included is a change to check slice_size and prevent it from being divisible by 4096 which also results in a crash. Otherwise a crash can occur at 1024x512 or 512x1024 resolution.
2022-12-20 21:30:00 -05:00
AUTOMATIC 505ec7e4d9 cleanup some unneeded imports for hijack files 2022-12-10 09:17:39 +03:00
AUTOMATIC 7dbfd8a7d8 do not replace entire unet for the resolution hack 2022-12-10 09:14:45 +03:00
Billy Cao adb6cb7619 Patch UNet Forward to support resolutions that are not multiples of 64
Also modifed the UI to no longer step in 64
2022-11-23 18:11:24 +08:00
Cheka 2fd7935ef4 Remove wrong self reference in CUDA support for invokeai 2022-10-19 09:35:53 +03:00
C43H66N12O12S2 c71008c741 Update sd_hijack_optimizations.py 2022-10-18 11:53:04 +03:00
C43H66N12O12S2 84823275e8 readd xformers attnblock 2022-10-18 11:53:04 +03:00
C43H66N12O12S2 2043c4a231 delete xformers attnblock 2022-10-18 11:53:04 +03:00
brkirch 861db783c7 Use apply_hypernetwork function 2022-10-11 17:24:00 +03:00
brkirch 574c8e554a Add InvokeAI and lstein to credits, add back CUDA support 2022-10-11 17:24:00 +03:00
brkirch 98fd5cde72 Add check for psutil 2022-10-11 17:24:00 +03:00
brkirch c0484f1b98 Add cross-attention optimization from InvokeAI
* Add cross-attention optimization from InvokeAI (~30% speed improvement on MPS)
* Add command line option for it
* Make it default when CUDA is unavailable
2022-10-11 17:24:00 +03:00
AUTOMATIC 873efeed49 rename hypernetwork dir to hypernetworks to prevent clash with an old filename that people who use zip instead of git clone will have 2022-10-11 15:51:30 +03:00
AUTOMATIC 530103b586 fixes related to merge 2022-10-11 14:53:02 +03:00
AUTOMATIC 948533950c replace duplicate code with a function 2022-10-11 11:10:17 +03:00
C43H66N12O12S2 3e7a981194 remove functorch 2022-10-10 19:54:07 +03:00