From d2f435eaf128068413ca85c38ade1a542a36c3fb Mon Sep 17 00:00:00 2001
From: Victor Hall <victor.charles.hall@gmail.com>
Date: Sun, 3 Mar 2024 16:36:50 -0500
Subject: [PATCH] another minor caption doc update

---
 doc/CAPTION.md | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/doc/CAPTION.md b/doc/CAPTION.md
index 8b1896d..5e30497 100644
--- a/doc/CAPTION.md
+++ b/doc/CAPTION.md
@@ -12,6 +12,9 @@ It has the capability to output grounding bounding boxes.
 
 Run `python caption_kosmos2.py --help` to get a list of options. 
 
+You can use `--prompt` to provide a prompt.  The official suggested prompts are `An image of` or `Describe this image in detail:`.  The later is the default if you do not set a prompt.
+If you want to use Kosmos-2 as a VQA (visual question answering), format your prompt like so `Question: Is there watermark on this image? Answer:`.
+
 ### _Kosmos-2 grounding_
 
 Kosmos-2can generate bounding boxes for the "grounding" of the caption.  This is useful for identifying specific objects in the image in 2D space, which can be useful in later piplines. 
@@ -20,8 +23,8 @@ It's worth reading the documentation [here](https://huggingface.co/microsoft/kos
 
 `--save_entities` outputs a '.ent' file with bounding box information.  The entities identified will be based on what caption is produced.
 
-`--phrase_mode` This modifies how the model is called, wrapping phrases in \<phrase> tags.  This also interprets your prompt as a CSV, wrapping each item in a phrase tag. You might use it with `--prompt "dog,cat,tree"` for instance.  *This is not a gaurantee your phrases will be found and output into the grounding output file.*
+`--phrase_mode` This modifies how the model is called, wrapping phrases in \<phrase> tags to identify specific classes.  This also interprets your prompt as a CSV, wrapping each item in a phrase tag. You might use it with `--prompt "dog,cat,tree"` for instance.  *This is not a gaurantee your phrases will be found and output into the grounding output file.* Things like  `--phrase_mode --prompt "watermark"` might work as a poor man's watermark detector, but with mixed results so its best to test with your data.
 
-`--save_entities_only` This will not attempt to write the caption into the .txt file at all.  **This is recommended with `--phrase_mode`**. Using this option forces `--save_entities`.
+`--save_entities_only` This will not attempt to write the caption into the .txt file at all.  **This is recommended with `--phrase_mode` for object detection**. Using this option forces `--save_entities`.
 
 There is a trivial/dumb UI for viewing the grounding in the scripts folder.  Launch it with `python scripts/grounding_ui.py` and it will open a window allowing you to select a directory, and it will display the images and bounding boxes.