Kohaku-Blueleaf
209c26a1cb
improve efficiency and support more device
2024-01-09 22:11:44 +08:00
AUTOMATIC1111
a70dfb64a8
change import statements for #14478
2023-12-31 22:38:30 +03:00
Aarni Koskela
5768afc776
Add utility to inspect a model's parameters (to get dtype/device)
2023-12-31 13:22:43 +02:00
Kohaku-Blueleaf
9a15ae2a92
Merge branch 'dev' into test-fp8
2023-12-03 10:54:54 +08:00
AUTOMATIC1111
af5f0734c9
Merge pull request #14171 from Nuullll/ipex
...
Initial IPEX support for Intel Arc GPU
2023-12-02 19:22:32 +03:00
Kohaku-Blueleaf
110485d5bb
Merge branch 'dev' into test-fp8
2023-12-02 17:00:09 +08:00
AUTOMATIC1111
88736b5557
Merge pull request #14131 from read-0nly/patch-1
...
Update devices.py - Make 'use-cpu all' actually apply to 'all'
2023-12-02 09:46:19 +03:00
Nuullll
7499148ad4
Disable ipex autocast due to its bad perf
2023-12-02 14:00:46 +08:00
Nuullll
8b40f475a3
Initial IPEX support
2023-11-30 20:22:46 +08:00
obsol
3cd6e1d0a0
Update devices.py
...
fixes issue where "--use-cpu" all properly makes SD run on CPU but leaves ControlNet (and other extensions, I presume) pointed at GPU, causing a crash in ControlNet caused by a mismatch between devices between SD and CN
https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/14097
2023-11-27 19:21:43 -05:00
Kohaku-Blueleaf
043d2edcf6
Better naming
2023-11-19 15:56:31 +08:00
Kohaku-Blueleaf
598da5cd49
Use options instead of cmd_args
2023-11-19 15:50:06 +08:00
KohakuBlueleaf
ddc2a3499b
Add MPS manual cast
2023-10-28 16:52:35 +08:00
Kohaku-Blueleaf
d4d3134f6d
ManualCast for 10/16 series gpu
2023-10-28 15:24:26 +08:00
Kohaku-Blueleaf
eaa9f5162f
Add CPU fp8 support
...
Since norm layer need fp32, I only convert the linear operation layer(conv2d/linear)
And TE have some pytorch function not support bf16 amp in CPU. I add a condition to indicate if the autocast is for unet.
2023-10-24 01:49:05 +08:00
AUTOMATIC1111
46375f0592
fix for crash when running #12924 without --device-id
2023-09-09 09:39:37 +03:00
catboxanon
5681bf8016
More accurate check for enabling cuDNN benchmark on 16XX cards
2023-08-31 14:57:16 -04:00
AUTOMATIC1111
386245a264
split shared.py into multiple files; should resolve all circular reference import errors related to shared.py
2023-08-09 10:25:35 +03:00
AUTOMATIC1111
0d5dc9a6e7
rework RNG to use generators instead of generating noises beforehand
2023-08-09 08:43:31 +03:00
AUTOMATIC1111
fca42949a3
rework torchsde._brownian.brownian_interval replacement to use device.randn_local and respect the NV setting.
2023-08-03 07:18:55 +03:00
AUTOMATIC1111
84b6fcd02c
add NV option for Random number generator source setting, which allows to generate same pictures on CPU/AMD/Mac as on NVidia videocards.
2023-08-03 00:00:23 +03:00
Aarni Koskela
b85fc7187d
Fix MPS cache cleanup
...
Importing torch does not import torch.mps so the call failed.
2023-07-11 12:51:05 +03:00
AUTOMATIC1111
da8916f926
added torch.mps.empty_cache() to torch_gc()
...
changed a bunch of places that use torch.cuda.empty_cache() to use torch_gc() instead
2023-07-08 17:13:18 +03:00
Aarni Koskela
ba70a220e3
Remove a bunch of unused/vestigial code
...
As found by Vulture and some eyes
2023-06-05 22:43:57 +03:00
AUTOMATIC
8faac8b963
run basic torch calculation at startup in parallel to reduce the performance impact of first generation
2023-05-21 21:55:14 +03:00
AUTOMATIC
028d3f6425
ruff auto fixes
2023-05-10 11:05:02 +03:00
AUTOMATIC
5fe0dd79be
rename CPU RNG to RNG source in settings, add infotext and parameters copypaste support to RNG source
2023-04-29 11:29:37 +03:00
Deciare
d40e44ade4
Option to use CPU for random number generation.
...
Makes a given manual seed generate the same images across different
platforms, independently of the GPU architecture in use.
Fixes #9613 .
2023-04-18 23:27:46 -04:00
brkirch
1b8af15f13
Refactor Mac specific code to a separate file
...
Move most Mac related code to a separate file, don't even load it unless web UI is run under macOS.
2023-02-01 14:05:56 -05:00
brkirch
2217331cd1
Refactor MPS fixes to CondFunc
2023-02-01 06:36:22 -05:00
brkirch
7738c057ce
MPS fix is still needed :(
...
Apparently I did not test with large enough images to trigger the bug with torch.narrow on MPS
2023-02-01 05:23:58 -05:00
AUTOMATIC1111
fecb990deb
Merge pull request #7309 from brkirch/fix-embeddings
...
Fix embeddings, upscalers, and refactor `--upcast-sampling`
2023-01-28 18:44:36 +03:00
brkirch
f9edd578e9
Remove MPS fix no longer needed for PyTorch
...
The torch.narrow fix was required for nightly PyTorch builds for a while to prevent a hard crash, but newer nightly builds don't have this issue.
2023-01-28 04:16:27 -05:00
brkirch
ada17dbd7c
Refactor conditional casting, fix upscalers
2023-01-28 04:16:25 -05:00
AUTOMATIC
9beb794e0b
clarify the option to disable NaN check.
2023-01-27 13:08:00 +03:00
AUTOMATIC
d2ac95fa7b
remove the need to place configs near models
2023-01-27 11:28:12 +03:00
brkirch
e3b53fd295
Add UI setting for upcasting attention to float32
...
Adds "Upcast cross attention layer to float32" option in Stable Diffusion settings. This allows for generating images using SD 2.1 models without --no-half or xFormers.
In order to make upcasting cross attention layer optimizations possible it is necessary to indent several sections of code in sd_hijack_optimizations.py so that a context manager can be used to disable autocast. Also, even though Stable Diffusion (and Diffusers) only upcast q and k, unfortunately my findings were that most of the cross attention layer optimizations could not function unless v is upcast also.
2023-01-25 01:13:04 -05:00
brkirch
84d9ce30cb
Add option for float32 sampling with float16 UNet
...
This also handles type casting so that ROCm and MPS torch devices work correctly without --no-half. One cast is required for deepbooru in deepbooru_model.py, some explicit casting is required for img2img and inpainting. depth_model can't be converted to float16 or it won't work correctly on some systems (it's known to have issues on MPS) so in sd_models.py model.depth_model is removed for model.half().
2023-01-25 01:13:02 -05:00
AUTOMATIC1111
aa60fc6660
Merge pull request #6922 from brkirch/cumsum-fix
...
Improve cumsum fix for MPS
2023-01-19 13:18:34 +03:00
brkirch
a255dac4f8
Fix cumsum for MPS in newer torch
...
The prior fix assumed that testing int16 was enough to determine if a fix is needed, but a recent fix for cumsum has int16 working but not bool.
2023-01-17 20:54:18 -05:00
AUTOMATIC
c361b89026
disable the new NaN check for the CI
2023-01-17 11:05:01 +03:00
AUTOMATIC
9991967f40
Add a check and explanation for tensor with all NaNs.
2023-01-16 22:59:46 +03:00
brkirch
8111b5569d
Add support for PyTorch nightly and local builds
2023-01-05 20:54:52 -05:00
brkirch
16b4509fa6
Add numpy fix for MPS on PyTorch 1.12.1
...
When saving training results with torch.save(), an exception is thrown:
"RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead."
So for MPS, check if Tensor.requires_grad and detach() if necessary.
2022-12-17 04:22:58 -05:00
AUTOMATIC
b6e5edd746
add built-in extension system
...
add support for adding upscalers in extensions
move LDSR, ScuNET and SwinIR to built-in extensions
2022-12-03 18:06:33 +03:00
AUTOMATIC
46b0d230e7
add comment for #4407 and remove seemingly unnecessary cudnn.enabled
2022-12-03 16:01:23 +03:00
AUTOMATIC
2651267e3a
fix #4407 breaking UI entirely for card other than ones related to the PR
2022-12-03 15:57:52 +03:00
AUTOMATIC1111
681c0003df
Merge pull request #4407 from yoinked-h/patch-1
...
Fix issue with 16xx cards
2022-12-03 10:30:34 +03:00
brkirch
0fddb4a1c0
Rework MPS randn fix, add randn_like fix
...
torch.manual_seed() already sets a CPU generator, so there is no reason to create a CPU generator manually. torch.randn_like also needs a MPS fix for k-diffusion, but a torch hijack with randn_like already exists so it can also be used for that.
2022-11-30 10:33:42 -05:00
AUTOMATIC1111
cc90dcc933
Merge pull request #4918 from brkirch/pytorch-fixes
...
Fixes for PyTorch 1.12.1 when using MPS
2022-11-27 13:47:01 +03:00