kosmos update and docs for loss_scale.txt

2023-11-03 13:50:11 -04:00 · 2023-11-03 13:50:11 -04:00 · 4c3de29d43
parent 30b063dfec
commit 4c3de29d43
2 changed files with 10 additions and 4 deletions
--- a/caption_kosmos2.py
+++ b/caption_kosmos2.py
@ -108,7 +108,7 @@ def main(args):

                if args.save_entities and (not os.path.exists(f"{name}.ent") or args.overwrite):
                    with open(f"{name}.ent", "w") as entities_file:
-                        entities_file.write(entities)
+                        entities_file.write(str(entities))
                gpu_mb_used = get_gpu_memory_map()
                print(f"gpu usage: {gpu_mb_used:.1f} mb, time taken: {time.time()-start_time:.2f} seconds")

@ -119,9 +119,9 @@ if __name__ == "__main__":
    parser.add_argument("--data_root", type=str, default="input", help="Path to folder of images to caption")
    parser.add_argument("--prompt", type=str, default="Describe this image in detail: ", help="Prompt for generating caption")
    parser.add_argument("--keep_prompt", action="store_true", default=False, help="will keep the prompt at the start of the caption when saved")
-    parser.add_argument("--max_new_tokens", type=int, default=75, help="Maximum number of tokens to generate")
+    parser.add_argument("--max_new_tokens", type=int, default=128, help="Maximum number of tokens to generate")
    parser.add_argument("--save_entities", action="store_true", default=False, help="Save coord box with entities to a separate .ent file")
-    parser.add_argument("--overwrite", action="store_true", default=False, help="will overwrite txt and ent files if they exist")
+    parser.add_argument("--overwrite", action="store_true", default=False, help="will overwrite .txt and .ent files if they exist")
    parser.add_argument("--cpu", action="store_true", default=False, help="use cpu instead of cuda")
    parser.add_argument("--dtype", type=str, default="fp16", help="force a different dtype if using GPU (fp16, bf16, fp32) (default: fp16)")
    args = parser.parse_args()
--- a/doc/BALANCING.md
+++ b/doc/BALANCING.md
@ -61,4 +61,10 @@ Given the above paragraphs on preservation, therefore, another way to utilize `m

 For instance, let's say you have 200 images of a subject you wish to train, and have collected a web scrape of 1000 images of a variety of styles and subjects to help the model remember and avoid "artifacts" and "bleeding" while hammering in your 200 images of a new subject.  If you train in a normal manner, you will be repeating the entire 1200 (1000+200) images at the same rate.  But for preservation, you do not need to repeat the 1000 preservation images so often as it is just there to help the model remember what it already knows.  It's simply not necessary to repeat these at the same rate as your new training images, and will lengthen training time unnecessarily. 

-In this case with 200 training images and 1000 preservation images, I would suggest placing `multiply.txt` in the  subfolder with your preservation images with the number around the range of `0.05` to `0.1`.  This will cause training to randomly select 50-100 preservation images (`1000*0.05=50` or `1000*0.10=100`) per epoch, while the actual training images (200) all get trained once per epoch.
+In this case with 200 training images and 1000 preservation images, I would suggest placing `multiply.txt` in the  subfolder with your preservation images with the number around the range of `0.05` to `0.1`.  This will cause training to randomly select 50-100 preservation images (`1000*0.05=50` or `1000*0.10=100`) per epoch, while the actual training images (200) all get trained once per epoch.
+
+## Loss scaling
+
+Another way to attempt to balance training is to use `loss_scale.txt`.  This works similarly to multiply.txt. Place a file called loss_scale.txt in the folder you want to adjust and type a decimal number in the file.  A value of 1.0 would be no change. 0.5 will effectively half the learning step size for the images in that folder, and so forth.  Negative values technically work, but use extreme caution as my testing shows this can really screw up your model
+
+This may be an alternative to using `multiply.txt``, but `multiply.txt`` 0.5 will reduce steps because it only chooses 50% of the images per epoch, while `loss_scale.txt` 0.5 will always use all the images but take smaller steps on them.  Similarly `multiply.txt` with 2.0 would use the images in that folder twice per epoch whichincreases step count, where loss_scale.txt 2.0 would only use them once but take *larger* learning steps instead performing *more* steps.