diffusers

Commit Graph

Author	SHA1	Message	Date
MatthieuTPHR	98c42134a5	Up to 2x speedup on GPUs using memory efficient attention (#532 ) * 2x speedup using memory efficient attention * remove einops dependency * Swap K, M in op instantiation * Simplify code, remove unnecessary maybe_init call and function, remove unused self.scale parameter * make xformers a soft dependency * remove one-liner functions * change one letter variable to appropriate names * Remove Env variable dependency, remove MemoryEfficientCrossAttention class and use enable_xformers_memory_efficient_attention method * Add memory efficient attention toggle to img2img and inpaint pipelines * Clearer management of xformers' availability * update optimizations markdown to add info about memory efficient attention * add benchmarks for TITAN RTX * More detailed explanation of how the mem eff benchmark were ran * Removing autocast from optimization markdown * import_utils: import torch only if is available Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>	2022-11-02 10:29:06 +01:00
Minwoo Byeon	fc0ca47456	Fix speedup ratio in fp16.mdx (#837 )	2022-10-29 09:26:23 +02:00
Pi Esposito	de00c63217	Document sequential CPU offload method on Stable Diffusion pipeline (#1024 ) * document cpu offloading method * address review comments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-10-27 16:52:21 +02:00
apolinario	8aac1f99d7	v1-5 docs updates (#921 ) * Update README.md Additionally add FLAX so the model card can be slimmer and point to this page * Find and replace all * v-1-5 -> v1-5 * revert test changes * Update README.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update docs/source/quicktour.mdx Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update README.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/quicktour.mdx Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update README.md Co-authored-by: Suraj Patil <surajp815@gmail.com> * Revert certain references to v1-5 * Docs changes * Apply suggestions from code review Co-authored-by: apolinario <joaopaulo.passos+multimodal@gmail.com> Co-authored-by: anton-l <anton@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2022-10-24 22:50:23 +02:00
Patrick von Platen	4deb16e830	[Docs] Advertise fp16 instead of autocast (#740 ) up	2022-10-05 22:20:53 +02:00
Patrick von Platen	78744b6a8f	No more use_auth_token=True (#733 ) * up * uP * uP * make style * Apply suggestions from code review * up * finish	2022-10-05 17:16:15 +02:00
Yuta Hayashibe	7e92c5bc73	Fix typos (#718 ) * Fix typos * Update examples/dreambooth/train_dreambooth.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2022-10-04 15:22:14 +02:00
Nouamane Tazi	daa22050c7	[docs] fix table in fp16.mdx (#683 )	2022-09-30 15:15:22 +02:00
Nouamane Tazi	9ebaea545f	Optimize Stable Diffusion (#371 ) * initial commit * make UNet stream capturable * try to fix noise_pred value * remove cuda graph and keep NB * non blocking unet with PNDMScheduler * make timesteps np arrays for pndm scheduler because lists don't get formatted to tensors in `self.set_format` * make max async in pndm * use channel last format in unet * avoid moving timesteps device in each unet call * avoid memcpy op in `get_timestep_embedding` * add `channels_last` kwarg to `DiffusionPipeline.from_pretrained` * update TODO * replace `channels_last` kwarg with `memory_format` for more generality * revert the channels_last changes to leave it for another PR * remove non_blocking when moving input ids to device * remove blocking from all .to() operations at beginning of pipeline * fix merging * fix merging * model can run in other precisions without autocast * attn refactoring * Revert "attn refactoring" This reverts commit 0c70c0e189cd2c4d8768274c9fcf5b940ee310fb. * remove restriction to run conv_norm in fp32 * use `baddbmm` instead of `matmul`for better in attention for better perf * removing all reshapes to test perf * Revert "removing all reshapes to test perf" This reverts commit 006ccb8a8c6bc7eb7e512392e692a29d9b1553cd. * add shapes comments * hardcore whats needed for jitting * Revert "hardcore whats needed for jitting" This reverts commit 2fa9c698eae2890ac5f8e367ca80532ecf94df9a. * Revert "remove restriction to run conv_norm in fp32" This reverts commit cec592890c32da3d1b78d38b49e4307aedf459b9. * revert using baddmm in attention's forward * cleanup comment * remove restriction to run conv_norm in fp32. no quality loss was noticed This reverts commit cc9bc1339c998ebe9e7d733f910c6d72d9792213. * add more optimizations techniques to docs * Revert "add shapes comments" This reverts commit 31c58eadb8892f95478cdf05229adf678678c5f4. * apply suggestions * make quality * apply suggestions * styling * `scheduler.timesteps` are now arrays so we dont need .to() * remove useless .type() * use mean instead of max in `test_stable_diffusion_inpaint_pipeline_k_lms` * move scheduler timestamps to correct device if tensors * add device to `set_timesteps` in LMSD scheduler * `self.scheduler.set_timesteps` now uses device arg for schedulers that accept it * quick fix * styling * remove kwargs from schedulers `set_timesteps` * revert to using max in K-LMS inpaint pipeline test * Revert "`self.scheduler.set_timesteps` now uses device arg for schedulers that accept it" This reverts commit 00d5a51e5c20d8d445c8664407ef29608106d899. * move timesteps to correct device before loop in SD pipeline * apply previous fix to other SD pipelines * UNet now accepts tensor timesteps even on wrong device, to avoid errors - it shouldnt affect performance if timesteps are alrdy on correct device - it does slow down performance if they're on the wrong device * fix pipeline when timesteps are arrays with strides	2022-09-30 09:49:13 +02:00
Pedro Cuenca	c29d81c3e3	Docs: fp16 page (#404 ) * Initial version of `fp16` page. * Fix typo in README. * Change titles of fp16 section in toctree. * PR suggestion Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * PR suggestion Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Clarify attention slicing is useful even for batches of 1 Explained by @patrickvonplaten after a suggestion by @keturn. Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Do not talk about `batches` in `enable_attention_slicing`. * Use Tip (just for fun), add link to method. * Comment about fp16 results looking the same as float32 in practice. * Style: docstring line wrapping. Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-09-08 09:17:51 +02:00
Patrick von Platen	5a38033de4	[Docs] Let's go (#385 )	2022-09-07 11:31:13 +02:00

11 Commits