EveryDream2trainer/doc/SHUFFLING_TAGS.md

2.9 KiB

Shuffling tags randomly during training

General shuffling

To help the model generalize better, EveryDream has an option to shuffle tags during the training.

This behavior can be activated using the parameter --shuffle_tags. The default is off.

The provided caption, extracted either from the file name or the provided caption file, will be split at each "," into separate chunks.

The first chunk will always be included in the caption provided during the training, the additional chunks are shuffled into a random order.

Each epoch the order is reshuffled. (Remember that each image is shown one per epoch to the model)

Weighted shuffling

EveryDream can read caption definitions from YAML files, for fine-tuned definitions.

EveryDream will check for each image if a file with the same name and the extension .yaml is provided.

The expected format of the YAML file:

main_prompt: A portrait of Cloud Strife
tags:
  - tag: low angle shot
  - tag: looking to the side
  - tag: holding buster sword
    weight: 1.5
  - tag: clouds in background
    weight: 0.5
  - tag: smiling
    weight: 0.8
max_caption_length: 1024

THe main prompt will always be the first part included in the caption. The main prompt is optional, you can provide none if you do not want a fixed part at the beginning of the caption.

This is followed by a list of tags. The tags will be shuffled into a random order and added to the caption. The tags list is optional.

The default weight of each tag is 1.0. A different weight can be optionally specified. Tags with a higher weight have a higher chance to appear in the front of the caption tag list.

The optional parameter max_caption_length allows the definition of a maximum length of the assembled caption. Only whole tags will be processed. If the addition of the next tag exceeds the max_caption_length, it will not be added, and the caption will be provided without the other tags for this epoch.

This can be used to train the model that an image can include a certain aspect, even if it is not explicitly mentioned in the caption.

General notes regarding token length

For SD, the current implementation of EveryDream can only process the first 75 tokens provided in the caption during training.

This is a base limitation of the SD Models. Workaround exists to extend this number but are currently not implemented in EveryDream.

The effect of the limit is that the caption will always be truncated when the maximum number of tokens is exceeded. This process does not consider if the cutoff is in the middle of a tag or even in the middle of a word if it is translated into several tokens.

To mitigate this token limitation (when not using weighted shuffling), the --keep_tags n parameter can be employed. This ensures that the first n tags following the initial chunk remain static, while the remaining tags are shuffled.