* [CI] Add Apple M1 tests
* setup-python
* python build
* conda install
* remove branch
* only 3.8 is built for osx-arm
* try fetching prebuilt tokenizers
* use user cache
* update shells
* Reports and cleanup
* -> MPS
* Disable parallel tests
* Better naming
* investigate worker crash
* return xdist
* restart
* num_workers=2
* still crashing?
* faulthandler for segfaults
* faulthandler for segfaults
* remove restarts, stop on segfault
* torch version
* change installation order
* Use pre-RC version of PyTorch.
To be updated when it is released.
* Skip crashing test on MPS, add new one that works.
* Skip cuda tests in mps device.
* Actually use generator in test.
I think this was a typo.
* make style
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* add accelerate to load models with smaller memory footprint
* remove low_cpu_mem_usage as it is reduntant
* move accelerate init weights context to modelling utils
* add test to ensure results are the same when loading with accelerate
* add tests to ensure ram usage gets lower when using accelerate
* move accelerate logic to single snippet under modelling utils and remove it from configuration utils
* format code using to pass quality check
* fix imports with isor
* add accelerate to test extra deps
* only import accelerate if device_map is set to auto
* move accelerate availability check to diffusers import utils
* format code
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add grad ckpt to downsample blocks
* make it work
* don't pass gradient_checkpointing to upsample block
* add tests for UNet2DConditionModel
* add test_gradient_checkpointing
* add gradient_checkpointing for up and down blocks
* add functions to enable and disable grad ckpt
* remove the forward argument
* better naming
* make supports_gradient_checkpointing private
* update expected results of slow tests
* relax sum and mean tests
* Print shapes when reporting exception
* formatting
* fix sentence
* relax test_stable_diffusion_fast_ddim for gpu fp16
* relax flakey tests on GPU
* added comment on large tolerences
* black
* format
* set scheduler seed
* added generator
* use np.isclose
* set num_inference_steps to 50
* fix dep. warning
* update expected_slice
* preprocess if image
* updated expected results
* updated expected from CI
* pass generator to VAE
* undo change back to orig
* use orignal
* revert back the expected on cpu
* revert back values for CPU
* more undo
* update result after using gen
* update mean
* set generator for mps
* update expected on CI server
* undo
* use new seed every time
* cpu manual seed
* reduce num_inference_steps
* style
* use generator for randn
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Initial support for mps in Stable Diffusion pipeline.
* Initial "warmup" implementation when using mps.
* Make some deterministic tests pass with mps.
* Disable training tests when using mps.
* SD: generate latents in CPU then move to device.
This is especially important when using the mps device, because
generators are not supported there. See for example
https://github.com/pytorch/pytorch/issues/84288.
In addition, the other pipelines seem to use the same approach: generate
the random samples then move to the appropriate device.
After this change, generating an image in MPS produces the same result
as when using the CPU, if the same seed is used.
* Remove prints.
* Pass AutoencoderKL test_output_pretrained with mps.
Sampling from `posterior` must be done in CPU.
* Style
* Do not use torch.long for log op in mps device.
* Perform incompatible padding ops in CPU.
UNet tests now pass.
See https://github.com/pytorch/pytorch/issues/84535
* Style: fix import order.
* Remove unused symbols.
* Remove MPSWarmupMixin, do not apply automatically.
We do apply warmup in the tests, but not during normal use.
This adopts some PR suggestions by @patrickvonplaten.
* Add comment for mps fallback to CPU step.
* Add README_mps.md for mps installation and use.
* Apply `black` to modified files.
* Restrict README_mps to SD, show measures in table.
* Make PNDM indexing compatible with mps.
Addresses #239.
* Do not use float64 when using LDMScheduler.
Fixes#358.
* Fix typo identified by @patil-suraj
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Adapt example to new output style.
* Restore 1:1 results reproducibility with CompVis.
However, mps latents need to be generated in CPU because generators
don't work in the mps device.
* Move PyTorch nightly to requirements.
* Adapt `test_scheduler_outputs_equivalence` ton MPS.
* mps: skip training tests instead of ignoring silently.
* Make VQModel tests pass on mps.
* mps ddim tests: warmup, increase tolerance.
* ScoreSdeVeScheduler indexing made mps compatible.
* Make ldm pipeline tests pass using warmup.
* Style
* Simplify casting as suggested in PR.
* Add Known Issues to readme.
* `isort` import order.
* Remove _mps_warmup helpers from ModelMixin.
And just make changes to the tests.
* Skip tests using unittest decorator for consistency.
* Remove temporary var.
* Remove spurious blank space.
* Remove unused symbol.
* Remove README_mps.
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>