* [LoRA] Freezing the model weights
Freeze the model weights since we don't need to calculate grads for them.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Apply suggestions from code review
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Resolves ValueError: `num_inference_steps`: 1000 cannot be larger than `self.config.train_timesteps`: 50 as the unet model trained with this scheduler can only handle maximal 50 timesteps.
* EMA: fix `state_dict()` & add `cur_decay_value`
* EMA: fix a bug in `load_state_dict()`
'float' object (`state_dict["power"]`) has no attribute 'get'.
* del train_unconditional_ort.py
* Quality check and adding tokenizer
* Adapted stable diffusion to mixed precision+finished up style fixes
* Fixed based on patrick's review
* Fixed oom from number of validation images
* Removed unnecessary np.array conversion
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* better accelerated saving
* up
* finish
* finish
* uP
* up
* up
* fix
* Apply suggestions from code review
* correct ema
* Remove @
* up
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/training/dreambooth.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Fix torchvision.transforms and transforms function naming clash
* Update unconditional script for onnx
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Add center crop and horizontal flip to args
* Update command to use center crop and random flip
* Add center crop and horizontal flip to args
* Update command to use center crop and random flip
* Modify UNet2DConditionModel
- allow skipping mid_block
- adding a norm_group_size argument so that we can set the `num_groups` for group norm using `num_channels//norm_group_size`
- allow user to set dimension for the timestep embedding (`time_embed_dim`)
- the kernel_size for `conv_in` and `conv_out` is now configurable
- add random fourier feature layer (`GaussianFourierProjection`) for `time_proj`
- allow user to add the time and class embeddings before passing through the projection layer together - `time_embedding(t_emb + class_label))`
- added 2 arguments `attn1_types` and `attn2_types`
* currently we have argument `only_cross_attention`: when it's set to `True`, we will have a to the
`BasicTransformerBlock` block with 2 cross-attention , otherwise we
get a self-attention followed by a cross-attention; in k-upscaler, we need to have blocks that include just one cross-attention, or self-attention -> cross-attention;
so I added `attn1_types` and `attn2_types` to the unet's argument list to allow user specify the attention types for the 2 positions in each block; note that I stil kept
the `only_cross_attention` argument for unet for easy configuration, but it will be converted to `attn1_type` and `attn2_type` when passing down to the down blocks
- the position of downsample layer and upsample layer is now configurable
- in k-upscaler unet, there is only one skip connection per each up/down block (instead of each layer in stable diffusion unet), added `skip_freq = "block"` to support
this use case
- if user passes attention_mask to unet, it will prepare the mask and pass a flag to cross attention processer to skip the `prepare_attention_mask` step
inside cross attention block
add up/down blocks for k-upscaler
modify CrossAttention class
- make the `dropout` layer in `to_out` optional
- `use_conv_proj` - use conv instead of linear for all projection layers (i.e. `to_q`, `to_k`, `to_v`, `to_out`) whenever possible. note that when it's used to do cross
attention, to_k, to_v has to be linear because the `encoder_hidden_states` is not 2d
- `cross_attention_norm` - add an optional layernorm on encoder_hidden_states
- `attention_dropout`: add an optional dropout on attention score
adapt BasicTransformerBlock
- add an ada groupnorm layer to conditioning attention input with timestep embedding
- allow skipping the FeedForward layer in between the attentions
- replaced the only_cross_attention argument with attn1_type and attn2_type for more flexible configuration
update timestep embedding: add new act_fn gelu and an optional act_2
modified ResnetBlock2D
- refactored with AdaGroupNorm class (the timestep scale shift normalization)
- add `mid_channel` argument - allow the first conv to have a different output dimension from the second conv
- add option to use input AdaGroupNorm on the input instead of groupnorm
- add options to add a dropout layer after each conv
- allow user to set the bias in conv_shortcut (needed for k-upscaler)
- add gelu
adding conversion script for k-upscaler unet
add pipeline
* fix attention mask
* fix a typo
* fix a bug
* make sure model can be used with GPU
* make pipeline work with fp16
* fix an error in BasicTransfomerBlock
* make style
* fix typo
* some more fixes
* uP
* up
* correct more
* some clean-up
* clean time proj
* up
* uP
* more changes
* remove the upcast_attention=True from unet config
* remove attn1_types, attn2_types etc
* fix
* revert incorrect changes up/down samplers
* make style
* remove outdated files
* Apply suggestions from code review
* attention refactor
* refactor cross attention
* Apply suggestions from code review
* update
* up
* update
* Apply suggestions from code review
* finish
* Update src/diffusers/models/cross_attention.py
* more fixes
* up
* up
* up
* finish
* more corrections of conversion state
* act_2 -> act_2_fn
* remove dropout_after_conv from ResnetBlock2D
* make style
* simplify KAttentionBlock
* add fast test for latent upscaler pipeline
* add slow test
* slow test fp16
* make style
* add doc string for pipeline_stable_diffusion_latent_upscale
* add api doc page for latent upscaler pipeline
* deprecate attention mask
* clean up embeddings
* simplify resnet
* up
* clean up resnet
* up
* correct more
* up
* up
* improve a bit more
* correct more
* more clean-ups
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add docstrings for new unet config
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* # Copied from
* encode the image if not latent
* remove force casting vae to fp32
* fix
* add comments about preconditioning parameters from k-diffusion paper
* attn1_type, attn2_type -> add_self_attention
* clean up get_down_block and get_up_block
* fix
* fixed a typo(?) in ada group norm
* update slice attention processer for cross attention
* update slice
* fix fast test
* update the checkpoint
* finish tests
* fix-copies
* fix-copy for modeling_text_unet.py
* make style
* make style
* fix f-string
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix import
* correct changes
* fix resnet
* make fix-copies
* correct euler scheduler
* add missing #copied from for preprocess
* revert
* fix
* fix copies
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/models/cross_attention.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* clean up conversion script
* KDownsample2d,KUpsample2d -> KDownsample2D,KUpsample2D
* more
* Update src/diffusers/models/unet_2d_condition.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* remove prepare_extra_step_kwargs
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix a typo in timestep embedding
* remove num_image_per_prompt
* fix fasttest
* make style + fix-copies
* fix
* fix xformer test
* fix style
* doc string
* make style
* fix-copies
* docstring for time_embedding_norm
* make style
* final finishes
* make fix-copies
* fix tests
---------
Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Create convert_vae_pt_to_diffusers.py
Just a simple script to convert VAE.pt files to diffusers format
Tested with: https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/VAEs/orangemix.vae.pt
* Update convert_vae_pt_to_diffusers.py
Forgot to add the function call
* make style
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: chavinlo <example@example.com>
Related to #2124
The current implementation is throwing a shape mismatch error. Which makes sense, as this line is obviously missing, comparing to XFormersCrossAttnProcessor and LoRACrossAttnProcessor.
I don't have formal tests, but I compared `LoRACrossAttnProcessor` and `LoRAXFormersCrossAttnProcessor` ad-hoc, and they produce the same results with this fix.
Flagged images would be set to the blank image instead of the original image that contained the NSF concept for optional viewing.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
scheduling_ddpm: fix variance in the case of learned_range type.
In the case of learned_range variance type, there are missing logs
and exponent comparing to the theory (see "Improved Denoising Diffusion
Probabilistic Models" section 3.1 equation 15:
https://arxiv.org/pdf/2102.09672.pdf).
* Short doc on changing the scheduler in Flax.
* Apply fix from @patil-suraj
Co-authored-by: Suraj Patil <surajp815@gmail.com>
---------
Co-authored-by: Suraj Patil <surajp815@gmail.com>