Commit Graph

100 Commits

Author SHA1 Message Date
brkirch 2217331cd1 Refactor MPS fixes to CondFunc 2023-02-01 06:36:22 -05:00
brkirch 7738c057ce MPS fix is still needed :(
Apparently I did not test with large enough images to trigger the bug with torch.narrow on MPS
2023-02-01 05:23:58 -05:00
AUTOMATIC1111 fecb990deb
Merge pull request #7309 from brkirch/fix-embeddings
Fix embeddings, upscalers, and refactor `--upcast-sampling`
2023-01-28 18:44:36 +03:00
brkirch f9edd578e9 Remove MPS fix no longer needed for PyTorch
The torch.narrow fix was required for nightly PyTorch builds for a while to prevent a hard crash, but newer nightly builds don't have this issue.
2023-01-28 04:16:27 -05:00
brkirch ada17dbd7c Refactor conditional casting, fix upscalers 2023-01-28 04:16:25 -05:00
AUTOMATIC 9beb794e0b clarify the option to disable NaN check. 2023-01-27 13:08:00 +03:00
AUTOMATIC d2ac95fa7b remove the need to place configs near models 2023-01-27 11:28:12 +03:00
brkirch e3b53fd295 Add UI setting for upcasting attention to float32
Adds "Upcast cross attention layer to float32" option in Stable Diffusion settings. This allows for generating images using SD 2.1 models without --no-half or xFormers.

In order to make upcasting cross attention layer optimizations possible it is necessary to indent several sections of code in sd_hijack_optimizations.py so that a context manager can be used to disable autocast. Also, even though Stable Diffusion (and Diffusers) only upcast q and k, unfortunately my findings were that most of the cross attention layer optimizations could not function unless v is upcast also.
2023-01-25 01:13:04 -05:00
brkirch 84d9ce30cb Add option for float32 sampling with float16 UNet
This also handles type casting so that ROCm and MPS torch devices work correctly without --no-half. One cast is required for deepbooru in deepbooru_model.py, some explicit casting is required for img2img and inpainting. depth_model can't be converted to float16 or it won't work correctly on some systems (it's known to have issues on MPS) so in sd_models.py model.depth_model is removed for model.half().
2023-01-25 01:13:02 -05:00
AUTOMATIC1111 aa60fc6660
Merge pull request #6922 from brkirch/cumsum-fix
Improve cumsum fix for MPS
2023-01-19 13:18:34 +03:00
brkirch a255dac4f8 Fix cumsum for MPS in newer torch
The prior fix assumed that testing int16 was enough to determine if a fix is needed, but a recent fix for cumsum has int16 working but not bool.
2023-01-17 20:54:18 -05:00
AUTOMATIC c361b89026 disable the new NaN check for the CI 2023-01-17 11:05:01 +03:00
AUTOMATIC 9991967f40 Add a check and explanation for tensor with all NaNs. 2023-01-16 22:59:46 +03:00
brkirch 8111b5569d Add support for PyTorch nightly and local builds 2023-01-05 20:54:52 -05:00
brkirch 16b4509fa6 Add numpy fix for MPS on PyTorch 1.12.1
When saving training results with torch.save(), an exception is thrown:
"RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead."

So for MPS, check if Tensor.requires_grad and detach() if necessary.
2022-12-17 04:22:58 -05:00
AUTOMATIC b6e5edd746 add built-in extension system
add support for adding upscalers in extensions
move LDSR, ScuNET and SwinIR to built-in extensions
2022-12-03 18:06:33 +03:00
AUTOMATIC 46b0d230e7 add comment for #4407 and remove seemingly unnecessary cudnn.enabled 2022-12-03 16:01:23 +03:00
AUTOMATIC 2651267e3a fix #4407 breaking UI entirely for card other than ones related to the PR 2022-12-03 15:57:52 +03:00
AUTOMATIC1111 681c0003df
Merge pull request #4407 from yoinked-h/patch-1
Fix issue with 16xx cards
2022-12-03 10:30:34 +03:00
brkirch 0fddb4a1c0 Rework MPS randn fix, add randn_like fix
torch.manual_seed() already sets a CPU generator, so there is no reason to create a CPU generator manually. torch.randn_like also needs a MPS fix for k-diffusion, but a torch hijack with randn_like already exists so it can also be used for that.
2022-11-30 10:33:42 -05:00
AUTOMATIC1111 cc90dcc933
Merge pull request #4918 from brkirch/pytorch-fixes
Fixes for PyTorch 1.12.1 when using MPS
2022-11-27 13:47:01 +03:00
AUTOMATIC 5b2c316890 eliminate duplicated code from #5095 2022-11-27 13:08:54 +03:00
Matthew McGoogan c67c40f983 torch.cuda.empty_cache() defaults to cuda:0 device unless explicitly set otherwise first. Updating torch_gc() to use the device set by --device-id if specified to avoid OOM edge cases on multi-GPU systems. 2022-11-26 23:25:16 +00:00
brkirch e247b7400a Add fixes for PyTorch 1.12.1
Fix typo "MasOS" -> "macOS"

If MPS is available and PyTorch is an earlier version than 1.13:
* Monkey patch torch.Tensor.to to ensure all tensors sent to MPS are contiguous
* Monkey patch torch.nn.functional.layer_norm to ensure input tensor is contiguous (required for this program to work with MPS on unmodified PyTorch 1.12.1)
2022-11-21 02:07:19 -05:00
brkirch abfa22c16f Revert "MPS Upscalers Fix"
This reverts commit 768b95394a.
2022-11-17 00:08:21 -05:00
AUTOMATIC 0ab0a50f9a change formatting to match the main program in devices.py 2022-11-12 10:00:49 +03:00
源文雨 1130d5df66
Update devices.py 2022-11-12 11:09:28 +08:00
源文雨 76ab31e188 Fix wrong mps selection below MasOS 12.3 2022-11-12 11:02:40 +08:00
pepe10-gpu 62e9fec3df
actual better fix
thanks C43H66N12O12S2
2022-11-08 15:19:09 -08:00
pepe10-gpu 29eff4a194
terrible hack 2022-11-07 18:06:48 -08:00
pepe10-gpu cd6c55c1ab
16xx card fix
cudnn
2022-11-06 17:05:51 -08:00
brkirch faed465a0b MPS Upscalers Fix
Get ESRGAN, SCUNet, and SwinIR working correctly on MPS by ensuring memory is contiguous for tensor views before sending to MPS device.
2022-10-25 09:42:53 +03:00
brkirch 4c24347e45 Remove BSRGAN from --use-cpu, add SwinIR 2022-10-25 09:42:53 +03:00
AUTOMATIC 50b5504401 remove parsing command line from devices.py 2022-10-22 14:04:14 +03:00
Extraltodeus 57eb54b838
implement CUDA device selection by ID 2022-10-22 00:11:07 +02:00
brkirch fdef8253a4 Add 'interrogate' and 'all' choices to --use-cpu
* Add 'interrogate' and 'all' choices to --use-cpu
* Change type for --use-cpu argument to str.lower, so that choices are case insensitive
2022-10-14 16:31:39 +03:00
AUTOMATIC 7349088d32 --no-half-vae 2022-10-10 16:16:29 +03:00
brkirch e9e2a7ec9a
Merge branch 'master' into cpu-cmdline-opt 2022-10-04 07:42:53 -04:00
AUTOMATIC 6c6ae28bf5 send all three of GFPGAN's and codeformer's models to CPU memory instead of just one for #1283 2022-10-04 12:32:22 +03:00
brkirch 27ddc24fde Add BSRGAN to --add-cpu 2022-10-04 05:18:17 -04:00
brkirch eeab7aedf5 Add --use-cpu command line option
Remove MPS detection to use CPU for GFPGAN / CodeFormer and add a --use-cpu command line option.
2022-10-04 04:24:35 -04:00
brkirch b88e4ea7d6
Merge branch 'master' into master 2022-10-04 01:04:19 -04:00
AUTOMATIC 820f1dc96b initial support for training textual inversion 2022-10-02 15:03:39 +03:00
brkirch bdaa36c844 When device is MPS, use CPU for GFPGAN instead
GFPGAN will not work if the device is MPS, so default to CPU instead.
2022-09-30 23:53:25 -04:00
AUTOMATIC 9d40212485 first attempt to produce crrect seeds in batch 2022-09-13 21:49:58 +03:00
AUTOMATIC c7e0e28ccd changes for #294 2022-09-12 20:09:32 +03:00
AUTOMATIC b70b51cc72 Allow TF32 in CUDA for increased performance #279 2022-09-12 16:34:13 +03:00
AUTOMATIC 8fb9c57ed6 add half() supporrt for CLIP interrogation 2022-09-11 23:24:24 +03:00
AUTOMATIC f194457229 CLIP interrogator 2022-09-11 18:48:36 +03:00
Abdullah Barhoum b5d1af11b7 Modular device management 2022-09-11 09:49:43 +03:00