update clip skip doc

This commit is contained in:
Victor Hall 2023-04-16 20:19:08 -04:00
parent 743c7cccae
commit bc7b95a375
3 changed files with 5 additions and 3 deletions

View File

@ -54,13 +54,15 @@ General suggestion is 1e-6 for training SD1.5 at 512 resolution. For SD2.1 at 7
## Clip skip
Aka "penultimate layer", this takes the output from the text encoder not from its last output layer, but layers before.
Clip skip counts back from the last hidden layer of the text encoder output for use as the text embedding.
*Note: since EveryDream2 uses HuggingFace Diffusers library, the penultimate layer is already selected when training and running inference on SD2.x models.* This is defined in the text_encoder/config.json by the "num_hidden_layers" property of 23, which is penultimate out of the 24 layers and set by default in all diffusers SD2.x models.
--clip_skip 2 ^
A value of "2" is the canonical form of "penultimate layer" useed by various webuis, but 1 to 4 are accepted as well if you wish to experiment. Default is "0" which takes the "last hidden layer" or standard output of the text encoder as Stable Diffusion 1.X was originally designed. Training with this setting may necessititate or work better when also using the same setting in your webui/inference program.
A value of "2" will count back one additional layer. For SD1.x, "2" would be "penultimate" layer as commonly referred to in the community. For SD2.x, it would be an *additional* layer back.
Values of 0 to 3 are valid and working. The number indicates how many extra layers to go "back" into the CLIP embedding output. 0 is the last layer and the default behavior. 1 is the layer before that, etc.
*A value of "0" or "1" does nothing.*
### Cosine LR scheduler
Cosine LR scheduler will "taper off" your learning rate over time. It will reach a peak value of your ```--lr``` value then taper off following a cosine curve. In other words, it allows you to set a high initial learning rate which lowers as training progresses. This *may* help speed up training without overfitting. If you wish to use this, I would set a slightly higher initial [learning rate](#lr-tweaking), maybe by 25-50% than you might use with a normal constant LR schedule.

BIN
doc/vast_1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

BIN
doc/vast_3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 167 KiB