EveryDream2trainer/doc/SHUFFLING_TAGS.md

69 lines
2.9 KiB
Markdown
Raw Permalink Normal View History

# Shuffling tags randomly during training
## General shuffling
To help the model generalize better, EveryDream has an option to shuffle tags during the training.
This behavior can be activated using the parameter _--shuffle_tags_. The default is off.
The provided caption, extracted either from the file name or the provided caption file,
will be split at each "_,_" into separate chunks.
The first chunk will always be included in the caption provided during the training,
the additional chunks are shuffled into a random order.
Each epoch the order is reshuffled. _(Remember that each image is shown one per epoch to the model)_
## Weighted shuffling
EveryDream can read caption definitions from YAML files, for fine-tuned definitions.
EveryDream will check for each image if a file with the same name and the extension _.yaml_ is provided.
The expected format of the YAML file:
````yaml
main_prompt: A portrait of Cloud Strife
tags:
- tag: low angle shot
- tag: looking to the side
- tag: holding buster sword
weight: 1.5
- tag: clouds in background
weight: 0.5
- tag: smiling
weight: 0.8
max_caption_length: 1024
````
THe main prompt will always be the first part included in the caption.
The main prompt is optional, you can provide none if you do not want a fixed part at the beginning of the caption.
This is followed by a list of tags. The tags will be shuffled into a random order and added to the caption.
The tags list is optional.
The default weight of each tag is _1.0_. A different weight can be optionally specified.
Tags with a higher weight have a higher chance to appear in the front of the caption tag list.
The optional parameter _max_caption_length_ allows the definition of a maximum length of the assembled caption.
Only whole tags will be processed. If the addition of the next tag exceeds the _max_caption_length_,
it will not be added, and the caption will be provided without the other tags for this epoch.
This can be used to train the model that an image can include a certain aspect, even if it is not
explicitly mentioned in the caption.
## General notes regarding token length
For SD, the current implementation of EveryDream can only process the first 75 tokens
provided in the caption during training.
This is a base limitation of the SD Models. Workaround exists to extend this number but are currently not
implemented in EveryDream.
The effect of the limit is that the caption will always be truncated when the maximum number of tokens is
exceeded. This process does not consider if the cutoff is in the middle of a tag or even in the middle of a
word if it is translated into several tokens.
To mitigate this token limitation (when not using weighted shuffling), the `--keep_tags n` parameter can be employed. This ensures that the first n tags following the initial chunk remain static, while the remaining tags are shuffled.