* add scaffold
- copied convert_controlnet_to_diffusers.py from
convert_original_stable_diffusion_to_diffusers.py
* Add support to load ControlNet (WIP)
- this makes Missking Key error on ControlNetModel
* Update to convert ControlNet without error msg
- init impl for StableDiffusionControlNetPipeline
- init impl for ControlNetModel
* cleanup of commented out
* split create_controlnet_diffusers_config()
from create_unet_diffusers_config()
- add config: hint_channels
* Add input_hint_block, input_zero_conv and
middle_block_out
- this makes missing key error on loading model
* add unet_2d_blocks_controlnet.py
- copied from unet_2d_blocks.py as impl CrossAttnDownBlock2D,DownBlock2D
- this makes missing key error on loading model
* Add loading for input_hint_block, zero_convs
and middle_block_out
- this makes no error message on model loading
* Copy from UNet2DConditionalModel except __init__
* Add ultra primitive test for ControlNetModel
inference
* Support ControlNetModel inference
- without exceptions
* copy forward() from UNet2DConditionModel
* Impl ControlledUNet2DConditionModel inference
- test_controlled_unet_inference passed
* Frozen weight & biases for training
* Minimized version of ControlNet/ControlledUnet
- test_modules_controllnet.py passed
* make style
* Add support model loading for minimized ver
* Remove all previous version files
* from_pretrained and inference test passed
* copied from pipeline_stable_diffusion.py
except `__init__()`
* Impl pipeline, pixel match test (almost) passed.
* make style
* make fix-copies
* Fix to add import ControlNet blocks
for `make fix-copies`
* Remove einops dependency
* Support np.ndarray, PIL.Image for controlnet_hint
* set default config file as lllyasviel's
* Add support grayscale (hw) numpy array
* Add and update docstrings
* add control_net.mdx
* add control_net.mdx to toctree
* Update copyright year
* Fix to add PIL.Image RGB->BGR conversion
- thanks @Mystfit
* make fix-copies
* add basic fast test for controlnet
* add slow test for controlnet/unet
* Ignore down/up_block len check on ControlNet
* add a copy from test_stable_diffusion.py
* Accept controlnet_hint is None
* merge pipeline_stable_diffusion.py diff
* Update class name to SDControlNetPipeline
* make style
* Baseline fast test almost passed (w long desc)
* still needs investigate.
Following didn't passed descriped in TODO comment:
- test_stable_diffusion_long_prompt
- test_stable_diffusion_no_safety_checker
Following didn't passed same as stable_diffusion_pipeline:
- test_attention_slicing_forward_pass
- test_inference_batch_single_identical
- test_xformers_attention_forwardGenerator_pass
these seems come from calc accuracy.
* Add note comment related vae_scale_factor
* add test_stable_diffusion_controlnet_ddim
* add assertion for vae_scale_factor != 8
* slow test of pipeline almost passed
Failed: test_stable_diffusion_pipeline_with_model_offloading
- ImportError: `enable_model_offload` requires `accelerate v0.17.0` or higher
but currently latest version == 0.16.0
* test_stable_diffusion_long_prompt passed
* test_stable_diffusion_no_safety_checker passed
- due to its model size, move to slow test
* remove PoC test files
* fix num_of_image, prompt length issue add add test
* add support List[PIL.Image] for controlnet_hint
* wip
* all slow test passed
* make style
* update for slow test
* RGB(PIL)->BGR(ctrlnet) conversion
* fixes
* remove manual num_images_per_prompt test
* add document
* add `image` argument docstring
* make style
* Add line to correct conversion
* add controlnet_conditioning_scale (aka control_scales
strength)
* rgb channel ordering by default
* image batching logic
* Add control image descriptions for each checkpoint
* Only save controlnet model in conversion script
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
typo
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* add gerated image example
* a depth mask -> a depth map
* rename control_net.mdx to controlnet.mdx
* fix toc title
* add ControlNet abstruct and link
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
Co-authored-by: dqueue <dbyqin@gmail.com>
* remove controlnet constructor arguments re: @patrickvonplaten
* [integration tests] test canny
* test_canny fixes
* [integration tests] test_depth
* [integration tests] test_hed
* [integration tests] test_mlsd
* add channel order config to controlnet
* [integration tests] test normal
* [integration tests] test_openpose test_scribble
* change height and width to default to conditioning image
* [integration tests] test seg
* style
* test_depth fix
* [integration tests] size fixes
* [integration tests] cpu offloading
* style
* generalize controlnet embedding
* fix conversion script
* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Style adapted to the documentation of pix2pix
* merge main by hand
* style
* [docs] controlling generation doc nits
* correct some things
* add: controlnetmodel to autodoc.
* finish docs
* finish
* finish 2
* correct images
* finish controlnet
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* uP
* upload model
* up
* up
---------
Co-authored-by: William Berman <WLBberman@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: dqueue <dbyqin@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* pipeline_variant
* Add docs for when clip_stats_path is specified
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* prepare_latents # Copied from re: @patrickvonplaten
* NoiseAugmentor->ImageNormalizer
* stable_unclip_prior default to None re: @patrickvonplaten
* prepare_prior_extra_step_kwargs
* prior denoising scale model input
* {DDIM,DDPM}Scheduler -> KarrasDiffusionSchedulers re: @patrickvonplaten
* docs
* Update docs/source/en/api/pipelines/stable_unclip.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Modify UNet2DConditionModel
- allow skipping mid_block
- adding a norm_group_size argument so that we can set the `num_groups` for group norm using `num_channels//norm_group_size`
- allow user to set dimension for the timestep embedding (`time_embed_dim`)
- the kernel_size for `conv_in` and `conv_out` is now configurable
- add random fourier feature layer (`GaussianFourierProjection`) for `time_proj`
- allow user to add the time and class embeddings before passing through the projection layer together - `time_embedding(t_emb + class_label))`
- added 2 arguments `attn1_types` and `attn2_types`
* currently we have argument `only_cross_attention`: when it's set to `True`, we will have a to the
`BasicTransformerBlock` block with 2 cross-attention , otherwise we
get a self-attention followed by a cross-attention; in k-upscaler, we need to have blocks that include just one cross-attention, or self-attention -> cross-attention;
so I added `attn1_types` and `attn2_types` to the unet's argument list to allow user specify the attention types for the 2 positions in each block; note that I stil kept
the `only_cross_attention` argument for unet for easy configuration, but it will be converted to `attn1_type` and `attn2_type` when passing down to the down blocks
- the position of downsample layer and upsample layer is now configurable
- in k-upscaler unet, there is only one skip connection per each up/down block (instead of each layer in stable diffusion unet), added `skip_freq = "block"` to support
this use case
- if user passes attention_mask to unet, it will prepare the mask and pass a flag to cross attention processer to skip the `prepare_attention_mask` step
inside cross attention block
add up/down blocks for k-upscaler
modify CrossAttention class
- make the `dropout` layer in `to_out` optional
- `use_conv_proj` - use conv instead of linear for all projection layers (i.e. `to_q`, `to_k`, `to_v`, `to_out`) whenever possible. note that when it's used to do cross
attention, to_k, to_v has to be linear because the `encoder_hidden_states` is not 2d
- `cross_attention_norm` - add an optional layernorm on encoder_hidden_states
- `attention_dropout`: add an optional dropout on attention score
adapt BasicTransformerBlock
- add an ada groupnorm layer to conditioning attention input with timestep embedding
- allow skipping the FeedForward layer in between the attentions
- replaced the only_cross_attention argument with attn1_type and attn2_type for more flexible configuration
update timestep embedding: add new act_fn gelu and an optional act_2
modified ResnetBlock2D
- refactored with AdaGroupNorm class (the timestep scale shift normalization)
- add `mid_channel` argument - allow the first conv to have a different output dimension from the second conv
- add option to use input AdaGroupNorm on the input instead of groupnorm
- add options to add a dropout layer after each conv
- allow user to set the bias in conv_shortcut (needed for k-upscaler)
- add gelu
adding conversion script for k-upscaler unet
add pipeline
* fix attention mask
* fix a typo
* fix a bug
* make sure model can be used with GPU
* make pipeline work with fp16
* fix an error in BasicTransfomerBlock
* make style
* fix typo
* some more fixes
* uP
* up
* correct more
* some clean-up
* clean time proj
* up
* uP
* more changes
* remove the upcast_attention=True from unet config
* remove attn1_types, attn2_types etc
* fix
* revert incorrect changes up/down samplers
* make style
* remove outdated files
* Apply suggestions from code review
* attention refactor
* refactor cross attention
* Apply suggestions from code review
* update
* up
* update
* Apply suggestions from code review
* finish
* Update src/diffusers/models/cross_attention.py
* more fixes
* up
* up
* up
* finish
* more corrections of conversion state
* act_2 -> act_2_fn
* remove dropout_after_conv from ResnetBlock2D
* make style
* simplify KAttentionBlock
* add fast test for latent upscaler pipeline
* add slow test
* slow test fp16
* make style
* add doc string for pipeline_stable_diffusion_latent_upscale
* add api doc page for latent upscaler pipeline
* deprecate attention mask
* clean up embeddings
* simplify resnet
* up
* clean up resnet
* up
* correct more
* up
* up
* improve a bit more
* correct more
* more clean-ups
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add docstrings for new unet config
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* # Copied from
* encode the image if not latent
* remove force casting vae to fp32
* fix
* add comments about preconditioning parameters from k-diffusion paper
* attn1_type, attn2_type -> add_self_attention
* clean up get_down_block and get_up_block
* fix
* fixed a typo(?) in ada group norm
* update slice attention processer for cross attention
* update slice
* fix fast test
* update the checkpoint
* finish tests
* fix-copies
* fix-copy for modeling_text_unet.py
* make style
* make style
* fix f-string
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix import
* correct changes
* fix resnet
* make fix-copies
* correct euler scheduler
* add missing #copied from for preprocess
* revert
* fix
* fix copies
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/models/cross_attention.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* clean up conversion script
* KDownsample2d,KUpsample2d -> KDownsample2D,KUpsample2D
* more
* Update src/diffusers/models/unet_2d_condition.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* remove prepare_extra_step_kwargs
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix a typo in timestep embedding
* remove num_image_per_prompt
* fix fasttest
* make style + fix-copies
* fix
* fix xformer test
* fix style
* doc string
* make style
* fix-copies
* docstring for time_embedding_norm
* make style
* final finishes
* make fix-copies
* fix tests
---------
Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Create convert_vae_pt_to_diffusers.py
Just a simple script to convert VAE.pt files to diffusers format
Tested with: https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/VAEs/orangemix.vae.pt
* Update convert_vae_pt_to_diffusers.py
Forgot to add the function call
* make style
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: chavinlo <example@example.com>
* Safetensors loading in "convert_diffusers_to_original_stable_diffusion"
Adds diffusers format saftetensors loading support
* Fix import sort order: convert_diffusers_to_original_stable_diffusion.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* convert __main__ to a function call and call it
* add missing type hint
* make style check pass
* move loading to src/diffusers
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added dit model
* import
* initial pipeline
* initial convert script
* initial pipeline
* make style
* raise valueerror
* single function
* rename classes
* use DDIMScheduler
* timesteps embedder
* samples to cpu
* fix var names
* fix numpy type
* use timesteps class for proj
* fix typo
* fix arg name
* flip_sin_to_cos and better var names
* fix C shape cal
* make style
* remove unused imports
* cleanup
* add back patch_size
* initial dit doc
* typo
* Update docs/source/api/pipelines/dit.mdx
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* added copyright license headers
* added example usage and toc
* fix variable names asserts
* remove comment
* added docs
* fix typo
* upstream changes
* set proper device for drop_ids
* added initial dit pipeline test
* update docs
* fix imports
* make fix-copies
* isort
* fix imports
* get rid of more magic numbers
* fix code when guidance is off
* remove block_kwargs
* cleanup script
* removed to_2tuple
* use FeedForward class instead of another MLP
* style
* work on mergint DiTBlock with BasicTransformerBlock
* added missing final_dropout and args to BasicTransformerBlock
* use norm from block
* fix arg
* remove unused arg
* fix call to class_embedder
* use timesteps
* make style
* attn_output gets multiplied
* removed commented code
* use Transformer2D
* use self.is_input_patches
* fix flags
* fixed conversion to use Transformer2DModel
* fixes for pipeline
* remove dit.py
* fix timesteps device
* use randn_tensor and fix fp16 inf.
* timesteps_emb already the right dtype
* fix dit test class
* fix test and style
* fix norm2 usage in vq-diffusion
* added author names to pipeline and lmagenet labels link
* fix tests
* use norm_type as string
* rename dit to transformer
* fix name
* fix test
* set norm_type = "layer" by default
* fix tests
* do not skip common tests
* Update src/diffusers/models/attention.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* revert AdaLayerNorm API
* fix norm_type name
* make sure all components are in eval mode
* revert norm2 API
* compact
* finish deprecation
* add slow tests
* remove @
* refactor some stuff
* upload
* Update src/diffusers/pipelines/dit/pipeline_dit.py
* finish more
* finish docs
* improve docs
* finish docs
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: William Berman <WLBberman@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Deterministic torch randn] Allow tensors to be generated on CPU
* fix more
* up
* fix more
* up
* Update src/diffusers/utils/torch_utils.py
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
* Apply suggestions from code review
* up
* up
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* move files a bit
* more refactors
* fix more
* more fixes
* fix more onnx
* make style
* upload
* fix
* up
* fix more
* up again
* up
* small fix
* Update src/diffusers/__init__.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* correct
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Initial code for attempt at improving SD <--> diffusers conversions for v2.0
* Updates to support round-trip between orig. SD 2.0 and diffusers models
* Corrected formatting to Black standard
* Correcting import formatting
* Fixed imports (properly this time)
* add some corrections
* remove inference files
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add paint by example
* mkae loading possibel
* up
* Update src/diffusers/models/attention.py
* up
* finalize weight structure
* make example work
* make it work
* up
* up
* fix
* del
* add
* update
* Apply suggestions from code review
* correct transformer 2d
* finish
* up
* up
* up
* up
* fix
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Apply suggestions from code review
* up
* finish
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* up
* convert dual unet
* revert dual attn
* adapt for vd-official
* test the full pipeline
* mixed inference
* mixed inference for text2img
* add image prompting
* fix clip norm
* split text2img and img2img
* fix format
* refactor text2img
* mega pipeline
* add optimus
* refactor image var
* wip text_unet
* text unet end to end
* update tests
* reshape
* fix image to text
* add some first docs
* dual guided pipeline
* fix token ratio
* propose change
* dual transformer as a native module
* DualTransformer(nn.Module)
* DualTransformer(nn.Module)
* correct unconditional image
* save-load with mega pipeline
* remove image to text
* up
* uP
* fix
* up
* final fix
* remove_unused_weights
* test updates
* save progress
* uP
* fix dual prompts
* some fixes
* finish
* style
* finish renaming
* up
* fix
* fix
* fix
* finish
Co-authored-by: anton-l <anton@huggingface.co>
* re-add RL model code
* match model forward api
* add register_to_config, pass training tests
* fix tests, update forward outputs
* remove unused code, some comments
* add to docs
* remove extra embedding code
* unify time embedding
* remove conv1d output sequential
* remove sequential from conv1dblock
* style and deleting duplicated code
* clean files
* remove unused variables
* clean variables
* add 1d resnet block structure for downsample
* rename as unet1d
* fix renaming
* rename files
* add get_block(...) api
* unify args for model1d like model2d
* minor cleaning
* fix docs
* improve 1d resnet blocks
* fix tests, remove permuts
* fix style
* add output activation
* rename flax blocks file
* Add Value Function and corresponding example script to Diffuser implementation (#884)
* valuefunction code
* start example scripts
* missing imports
* bug fixes and placeholder example script
* add value function scheduler
* load value function from hub and get best actions in example
* very close to working example
* larger batch size for planning
* more tests
* merge unet1d changes
* wandb for debugging, use newer models
* success!
* turns out we just need more diffusion steps
* run on modal
* merge and code cleanup
* use same api for rl model
* fix variance type
* wrong normalization function
* add tests
* style
* style and quality
* edits based on comments
* style and quality
* remove unused var
* hack unet1d into a value function
* add pipeline
* fix arg order
* add pipeline to core library
* community pipeline
* fix couple shape bugs
* style
* Apply suggestions from code review
Co-authored-by: Nathan Lambert <nathan@huggingface.co>
* update post merge of scripts
* add mdiblock / outblock architecture
* Pipeline cleanup (#947)
* valuefunction code
* start example scripts
* missing imports
* bug fixes and placeholder example script
* add value function scheduler
* load value function from hub and get best actions in example
* very close to working example
* larger batch size for planning
* more tests
* merge unet1d changes
* wandb for debugging, use newer models
* success!
* turns out we just need more diffusion steps
* run on modal
* merge and code cleanup
* use same api for rl model
* fix variance type
* wrong normalization function
* add tests
* style
* style and quality
* edits based on comments
* style and quality
* remove unused var
* hack unet1d into a value function
* add pipeline
* fix arg order
* add pipeline to core library
* community pipeline
* fix couple shape bugs
* style
* Apply suggestions from code review
* clean up comments
* convert older script to using pipeline and add readme
* rename scripts
* style, update tests
* delete unet rl model file
* remove imports in src
Co-authored-by: Nathan Lambert <nathan@huggingface.co>
* Update src/diffusers/models/unet_1d_blocks.py
* Update tests/test_models_unet.py
* RL Cleanup v2 (#965)
* valuefunction code
* start example scripts
* missing imports
* bug fixes and placeholder example script
* add value function scheduler
* load value function from hub and get best actions in example
* very close to working example
* larger batch size for planning
* more tests
* merge unet1d changes
* wandb for debugging, use newer models
* success!
* turns out we just need more diffusion steps
* run on modal
* merge and code cleanup
* use same api for rl model
* fix variance type
* wrong normalization function
* add tests
* style
* style and quality
* edits based on comments
* style and quality
* remove unused var
* hack unet1d into a value function
* add pipeline
* fix arg order
* add pipeline to core library
* community pipeline
* fix couple shape bugs
* style
* Apply suggestions from code review
* clean up comments
* convert older script to using pipeline and add readme
* rename scripts
* style, update tests
* delete unet rl model file
* remove imports in src
* add specific vf block and update tests
* style
* Update tests/test_models_unet.py
Co-authored-by: Nathan Lambert <nathan@huggingface.co>
* fix quality in tests
* fix quality style, split test file
* fix checks / tests
* make timesteps closer to main
* unify block API
* unify forward api
* delete lines in examples
* style
* examples style
* all tests pass
* make style
* make dance_diff test pass
* Refactoring RL PR (#1200)
* init file changes
* add import utils
* finish cleaning files, imports
* remove import flags
* clean examples
* fix imports, tests for merge
* update readmes
* hotfix for tests
* quality
* fix some tests
* change defaults
* more mps test fixes
* unet1d defaults
* do not default import experimental
* defaults for tests
* fix tests
* fix-copies
* fix
* changes per Patrik's comments (#1285)
* changes per Patrik's comments
* update conversion script
* fix renaming
* skip more mps tests
* last test fix
* Update examples/rl/README.md
Co-authored-by: Ben Glickenhaus <benglickenhaus@gmail.com>
* Changes for VQ-diffusion VQVAE
Add specify dimension of embeddings to VQModel:
`VQModel` will by default set the dimension of embeddings to the number
of latent channels. The VQ-diffusion VQVAE has a smaller
embedding dimension, 128, than number of latent channels, 256.
Add AttnDownEncoderBlock2D and AttnUpDecoderBlock2D to the up and down
unet block helpers. VQ-diffusion's VQVAE uses those two block types.
* Changes for VQ-diffusion transformer
Modify attention.py so SpatialTransformer can be used for
VQ-diffusion's transformer.
SpatialTransformer:
- Can now operate over discrete inputs (classes of vector embeddings) as well as continuous.
- `in_channels` was made optional in the constructor so two locations where it was passed as a positional arg were moved to kwargs
- modified forward pass to take optional timestep embeddings
ImagePositionalEmbeddings:
- added to provide positional embeddings to discrete inputs for latent pixels
BasicTransformerBlock:
- norm layers were made configurable so that the VQ-diffusion could use AdaLayerNorm with timestep embeddings
- modified forward pass to take optional timestep embeddings
CrossAttention:
- now may optionally take a bias parameter for its query, key, and value linear layers
FeedForward:
- Internal layers are now configurable
ApproximateGELU:
- Activation function in VQ-diffusion's feedforward layer
AdaLayerNorm:
- Norm layer modified to incorporate timestep embeddings
* Add VQ-diffusion scheduler
* Add VQ-diffusion pipeline
* Add VQ-diffusion convert script to diffusers
* Add VQ-diffusion dummy objects
* Add VQ-diffusion markdown docs
* Add VQ-diffusion tests
* some renaming
* some fixes
* more renaming
* correct
* fix typo
* correct weights
* finalize
* fix tests
* Apply suggestions from code review
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* finish
* finish
* up
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* start
* add more logic
* Update src/diffusers/models/unet_2d_condition_flax.py
* match weights
* up
* make model work
* making class more general, fixing missed file rename
* small fix
* make new conversion work
* up
* finalize conversion
* up
* first batch of variable renamings
* remove c and c_prev var names
* add mid and out block structure
* add pipeline
* up
* finish conversion
* finish
* upload
* more fixes
* Apply suggestions from code review
* add attr
* up
* uP
* up
* finish tests
* finish
* uP
* finish
* fix test
* up
* naming consistency in tests
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Nathan Lambert <nathan@huggingface.co>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
* remove hardcoded 16
* Remove bogus
* fix some stuff
* finish
* improve logging
* docs
* upload
Co-authored-by: Nathan Lambert <nol@berkeley.edu>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Nathan Lambert <nathan@huggingface.co>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>