* [MS Text To Video} Add first text to video
* upload
* make first model example
* match unet3d params
* make sure weights are correcctly converted
* improve
* forward pass works, but diff result
* make forward work
* fix more
* finish
* refactor video output class.
* feat: add support for a video export utility.
* fix: opencv availability check.
* run make fix-copies.
* add: docs for the model components.
* add: standalone pipeline doc.
* edit docstring of the pipeline.
* add: right path to TransformerTempModel
* add: first set of tests.
* complete fast tests for text to video.
* fix bug
* up
* three fast tests failing.
* add: note on slow tests
* make work with all schedulers
* apply styling.
* add slow tests
* change file name
* update
* more correction
* more fixes
* finish
* up
* Apply suggestions from code review
* up
* finish
* make copies
* fix pipeline tests
* fix more tests
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* apply suggestions
* up
* revert
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* updated black format
* update black format
* make style format
* updated line endings
* update code formatting
* Update examples/research_projects/onnxruntime/text_to_image/train_text_to_image.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/vae.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/vae.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added vae gradient checkpointing test
* make style
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Will Berman <wlbberman@gmail.com>
* Adding `use_safetensors` argument to give more control to users
about which weights they use.
* Doc style.
* Rebased (not functional).
* Rebased and functional with tests.
* Style.
* Apply suggestions from code review
* Style.
* Addressing comments.
* Update tests/test_pipelines.py
Co-authored-by: Will Berman <wlbberman@gmail.com>
* Black ???
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Will Berman <wlbberman@gmail.com>
* fix AttnProcessor2_0
Fix use of AttnProcessor2_0 for cross attention with mask
* added scale_qk and out_bias flags
* fixed for xformers
* check if it has scale argument
* Update cross_attention.py
* check torch version
* fix sliced attn
* style
* set scale
* fix test
* fixed addedKV processor
* revert back AttnProcessor2_0
* if missing if
* fix inner_dim
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Skip variant tests (UNet1d, UNetRL) on mps.
mish op not yet supported.
* Exclude a couple of panorama tests on mps
They are too slow for fast CI.
* Exclude mps panorama from more tests.
* mps: exclude all fast panorama tests as they keep failing.
* Modify UNet2DConditionModel
- allow skipping mid_block
- adding a norm_group_size argument so that we can set the `num_groups` for group norm using `num_channels//norm_group_size`
- allow user to set dimension for the timestep embedding (`time_embed_dim`)
- the kernel_size for `conv_in` and `conv_out` is now configurable
- add random fourier feature layer (`GaussianFourierProjection`) for `time_proj`
- allow user to add the time and class embeddings before passing through the projection layer together - `time_embedding(t_emb + class_label))`
- added 2 arguments `attn1_types` and `attn2_types`
* currently we have argument `only_cross_attention`: when it's set to `True`, we will have a to the
`BasicTransformerBlock` block with 2 cross-attention , otherwise we
get a self-attention followed by a cross-attention; in k-upscaler, we need to have blocks that include just one cross-attention, or self-attention -> cross-attention;
so I added `attn1_types` and `attn2_types` to the unet's argument list to allow user specify the attention types for the 2 positions in each block; note that I stil kept
the `only_cross_attention` argument for unet for easy configuration, but it will be converted to `attn1_type` and `attn2_type` when passing down to the down blocks
- the position of downsample layer and upsample layer is now configurable
- in k-upscaler unet, there is only one skip connection per each up/down block (instead of each layer in stable diffusion unet), added `skip_freq = "block"` to support
this use case
- if user passes attention_mask to unet, it will prepare the mask and pass a flag to cross attention processer to skip the `prepare_attention_mask` step
inside cross attention block
add up/down blocks for k-upscaler
modify CrossAttention class
- make the `dropout` layer in `to_out` optional
- `use_conv_proj` - use conv instead of linear for all projection layers (i.e. `to_q`, `to_k`, `to_v`, `to_out`) whenever possible. note that when it's used to do cross
attention, to_k, to_v has to be linear because the `encoder_hidden_states` is not 2d
- `cross_attention_norm` - add an optional layernorm on encoder_hidden_states
- `attention_dropout`: add an optional dropout on attention score
adapt BasicTransformerBlock
- add an ada groupnorm layer to conditioning attention input with timestep embedding
- allow skipping the FeedForward layer in between the attentions
- replaced the only_cross_attention argument with attn1_type and attn2_type for more flexible configuration
update timestep embedding: add new act_fn gelu and an optional act_2
modified ResnetBlock2D
- refactored with AdaGroupNorm class (the timestep scale shift normalization)
- add `mid_channel` argument - allow the first conv to have a different output dimension from the second conv
- add option to use input AdaGroupNorm on the input instead of groupnorm
- add options to add a dropout layer after each conv
- allow user to set the bias in conv_shortcut (needed for k-upscaler)
- add gelu
adding conversion script for k-upscaler unet
add pipeline
* fix attention mask
* fix a typo
* fix a bug
* make sure model can be used with GPU
* make pipeline work with fp16
* fix an error in BasicTransfomerBlock
* make style
* fix typo
* some more fixes
* uP
* up
* correct more
* some clean-up
* clean time proj
* up
* uP
* more changes
* remove the upcast_attention=True from unet config
* remove attn1_types, attn2_types etc
* fix
* revert incorrect changes up/down samplers
* make style
* remove outdated files
* Apply suggestions from code review
* attention refactor
* refactor cross attention
* Apply suggestions from code review
* update
* up
* update
* Apply suggestions from code review
* finish
* Update src/diffusers/models/cross_attention.py
* more fixes
* up
* up
* up
* finish
* more corrections of conversion state
* act_2 -> act_2_fn
* remove dropout_after_conv from ResnetBlock2D
* make style
* simplify KAttentionBlock
* add fast test for latent upscaler pipeline
* add slow test
* slow test fp16
* make style
* add doc string for pipeline_stable_diffusion_latent_upscale
* add api doc page for latent upscaler pipeline
* deprecate attention mask
* clean up embeddings
* simplify resnet
* up
* clean up resnet
* up
* correct more
* up
* up
* improve a bit more
* correct more
* more clean-ups
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add docstrings for new unet config
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* # Copied from
* encode the image if not latent
* remove force casting vae to fp32
* fix
* add comments about preconditioning parameters from k-diffusion paper
* attn1_type, attn2_type -> add_self_attention
* clean up get_down_block and get_up_block
* fix
* fixed a typo(?) in ada group norm
* update slice attention processer for cross attention
* update slice
* fix fast test
* update the checkpoint
* finish tests
* fix-copies
* fix-copy for modeling_text_unet.py
* make style
* make style
* fix f-string
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix import
* correct changes
* fix resnet
* make fix-copies
* correct euler scheduler
* add missing #copied from for preprocess
* revert
* fix
* fix copies
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/models/cross_attention.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* clean up conversion script
* KDownsample2d,KUpsample2d -> KDownsample2D,KUpsample2D
* more
* Update src/diffusers/models/unet_2d_condition.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* remove prepare_extra_step_kwargs
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix a typo in timestep embedding
* remove num_image_per_prompt
* fix fasttest
* make style + fix-copies
* fix
* fix xformer test
* fix style
* doc string
* make style
* fix-copies
* docstring for time_embedding_norm
* make style
* final finishes
* make fix-copies
* fix tests
---------
Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* make tests deterministic
* run slow tests
* prepare for testing
* finish
* refactor
* add print statements
* finish more
* correct some test failures
* more fixes
* set up to correct tests
* more corrections
* up
* fix more
* more prints
* add
* up
* up
* up
* uP
* uP
* more fixes
* uP
* up
* up
* up
* up
* fix more
* up
* up
* clean tests
* up
* up
* up
* more fixes
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* make
* correct
* finish
* finish
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* [Lora] first upload
* add first lora version
* upload
* more
* first training
* up
* correct
* improve
* finish loaders and inference
* up
* up
* fix more
* up
* finish more
* finish more
* up
* up
* change year
* revert year change
* Change lines
* Add cloneofsimo as co-author.
Co-authored-by: Simo Ryu <cloneofsimo@gmail.com>
* finish
* fix docs
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* upload
* finish
Co-authored-by: Simo Ryu <cloneofsimo@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* move files a bit
* more refactors
* fix more
* more fixes
* fix more onnx
* make style
* upload
* fix
* up
* fix more
* up again
* up
* small fix
* Update src/diffusers/__init__.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* correct
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* first proposal
* rename
* up
* Apply suggestions from code review
* better
* up
* finish
* up
* rename
* correct versatile
* up
* up
* up
* up
* fix
* Apply suggestions from code review
* make style
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* add error message
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* make attn slice recursive
* remove set_attention_slice from blocks
* fix copies
* make enable_attention_slicing base class method of DiffusionPipeline
* fix set_attention_slice
* fix set_attention_slice
* fix copies
* add tests
* up
* up
* up
* update
* up
* uP
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Flax: start adapting to Stable Diffusion 2
* More changes.
* attention_head_dim can be a tuple.
* Fix typos
* Add simple SD 2 integration test.
Slice values taken from my Ampere GPU.
* Add simple UNet integration tests for Flax.
Note that the expected values are taken from the PyTorch results. This
ensures the Flax and PyTorch versions are not too far off.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Typos and style
* Tests: verify jax is available.
* Style
* Make flake happy
* Remove typo.
* Simple Flax SD 2 pipeline tests.
* Import order
* Remove unused import.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: @camenduru
* add conversion script for vae
* uP
* uP
* more changes
* push
* up
* finish again
* up
* up
* up
* up
* finish
* up
* uP
* up
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* up
* up
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* re-add RL model code
* match model forward api
* add register_to_config, pass training tests
* fix tests, update forward outputs
* remove unused code, some comments
* add to docs
* remove extra embedding code
* unify time embedding
* remove conv1d output sequential
* remove sequential from conv1dblock
* style and deleting duplicated code
* clean files
* remove unused variables
* clean variables
* add 1d resnet block structure for downsample
* rename as unet1d
* fix renaming
* rename files
* add get_block(...) api
* unify args for model1d like model2d
* minor cleaning
* fix docs
* improve 1d resnet blocks
* fix tests, remove permuts
* fix style
* add output activation
* rename flax blocks file
* Add Value Function and corresponding example script to Diffuser implementation (#884)
* valuefunction code
* start example scripts
* missing imports
* bug fixes and placeholder example script
* add value function scheduler
* load value function from hub and get best actions in example
* very close to working example
* larger batch size for planning
* more tests
* merge unet1d changes
* wandb for debugging, use newer models
* success!
* turns out we just need more diffusion steps
* run on modal
* merge and code cleanup
* use same api for rl model
* fix variance type
* wrong normalization function
* add tests
* style
* style and quality
* edits based on comments
* style and quality
* remove unused var
* hack unet1d into a value function
* add pipeline
* fix arg order
* add pipeline to core library
* community pipeline
* fix couple shape bugs
* style
* Apply suggestions from code review
Co-authored-by: Nathan Lambert <nathan@huggingface.co>
* update post merge of scripts
* add mdiblock / outblock architecture
* Pipeline cleanup (#947)
* valuefunction code
* start example scripts
* missing imports
* bug fixes and placeholder example script
* add value function scheduler
* load value function from hub and get best actions in example
* very close to working example
* larger batch size for planning
* more tests
* merge unet1d changes
* wandb for debugging, use newer models
* success!
* turns out we just need more diffusion steps
* run on modal
* merge and code cleanup
* use same api for rl model
* fix variance type
* wrong normalization function
* add tests
* style
* style and quality
* edits based on comments
* style and quality
* remove unused var
* hack unet1d into a value function
* add pipeline
* fix arg order
* add pipeline to core library
* community pipeline
* fix couple shape bugs
* style
* Apply suggestions from code review
* clean up comments
* convert older script to using pipeline and add readme
* rename scripts
* style, update tests
* delete unet rl model file
* remove imports in src
Co-authored-by: Nathan Lambert <nathan@huggingface.co>
* Update src/diffusers/models/unet_1d_blocks.py
* Update tests/test_models_unet.py
* RL Cleanup v2 (#965)
* valuefunction code
* start example scripts
* missing imports
* bug fixes and placeholder example script
* add value function scheduler
* load value function from hub and get best actions in example
* very close to working example
* larger batch size for planning
* more tests
* merge unet1d changes
* wandb for debugging, use newer models
* success!
* turns out we just need more diffusion steps
* run on modal
* merge and code cleanup
* use same api for rl model
* fix variance type
* wrong normalization function
* add tests
* style
* style and quality
* edits based on comments
* style and quality
* remove unused var
* hack unet1d into a value function
* add pipeline
* fix arg order
* add pipeline to core library
* community pipeline
* fix couple shape bugs
* style
* Apply suggestions from code review
* clean up comments
* convert older script to using pipeline and add readme
* rename scripts
* style, update tests
* delete unet rl model file
* remove imports in src
* add specific vf block and update tests
* style
* Update tests/test_models_unet.py
Co-authored-by: Nathan Lambert <nathan@huggingface.co>
* fix quality in tests
* fix quality style, split test file
* fix checks / tests
* make timesteps closer to main
* unify block API
* unify forward api
* delete lines in examples
* style
* examples style
* all tests pass
* make style
* make dance_diff test pass
* Refactoring RL PR (#1200)
* init file changes
* add import utils
* finish cleaning files, imports
* remove import flags
* clean examples
* fix imports, tests for merge
* update readmes
* hotfix for tests
* quality
* fix some tests
* change defaults
* more mps test fixes
* unet1d defaults
* do not default import experimental
* defaults for tests
* fix tests
* fix-copies
* fix
* changes per Patrik's comments (#1285)
* changes per Patrik's comments
* update conversion script
* fix renaming
* skip more mps tests
* last test fix
* Update examples/rl/README.md
Co-authored-by: Ben Glickenhaus <benglickenhaus@gmail.com>
* Schedulers: don't use float64 on mps
* Test set_timesteps() on device (float schedulers).
* SD pipeline: use device in set_timesteps.
* SD in-painting pipeline: use device in set_timesteps.
* Tests: fix mps crashes.
* Skip test_load_pipeline_from_git on mps.
Not compatible with float16.
* Use device.type instead of str in Euler schedulers.
* make accelerate hard dep
* default fast init
* move params to cpu when device map is None
* handle device_map=None
* handle torch < 1.9
* remove device_map="auto"
* style
* add accelerate in torch extra
* remove accelerate from extras["test"]
* raise an error if torch is available but not accelerate
* update installation docs
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* improve defautl loading speed even further, allow disabling fats loading
* address review comments
* adapt the tests
* fix test_stable_diffusion_fast_load
* fix test_read_init
* temp fix for dummy checks
* Trigger Build
* Apply suggestions from code review
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
* improve test precision
get tests passing with greater precision using lewington images
* make old numpy load function a wrapper around a more flexible numpy loading function
* adhere to black formatting
* add more black formatting
* adhere to isort
* loosen precision and replace path
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>