Merge pull request #16 from derfred/doc-fixes

Doc fixes
2022-09-14 12:18:28 -07:00 · 2022-09-14 12:18:28 -07:00 · bbcd956746
parent f6c06933ba 40640acc83
commit bbcd956746
1 changed files with 14 additions and 9 deletions
--- a/docs/en/training/dataset.md
+++ b/docs/en/training/dataset.md
@ -86,30 +86,35 @@ rsync rsync://176.9.41.242:873/danbooru2021/512px/000* ./512px/
 ```
 Download the first batch of metadata, posts000000000000.json (800MB):
 ``` shell
-rsync rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/
+rsync -r rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/
 ```
 You should now have two folders named: 512px and metadata.

 ## Organizing the dataset
-Although we have the dataset, the metadata that explains what the image is, is inside the JSON file. In order to extract the data into individual txt files, we are going to use the script inside `` /waifu-diffusion/scripts/danbooru21_extract.py``
+Although we have the dataset, the metadata that explains what the image is, is inside the JSON file. In order to extract the data into individual txt files, we are going to use the script inside ``danbooru_data/local/extractfromjson_danboo21.py``

 Assuming you are in the same directory as metadata and 512px folder:
 ````bash 
-python /waifu-diffusion/scripts/danbooru21_extract.py
+python danbooru_data/local/extractfromjson_danboo21.py -J metadata/posts000000000000.json -E danbooru-aesthetic
 ````
-Change "/waifu-diffusion" to the path of the cloned waifu-diffusion repository.
-This script will also change some tags such as "1girl" to "one girl", "2boys" to "two boys", and so on. It will also add "upoaded on Danbooru".

-Once the script has finished, you should have a "labeled_data" folder, whose insides look like this:
+Once the script has finished, you should have a "danbooru-aesthetic" folder, whose insides look like this:

 ![labeled_data-insides.png](./res/labeled_data-insides.png)

 ## Packaging the dataset
+Next we need to put the extracted data into the format required in the section "Dataset requirements". Run the following commands:
+``` shell
+mkdir danbooru-aesthetic/img danbooru-aesthetic/txt
+mv danbooru-aesthetic/*.jpg labeled_data/img
+mv danbooru-aesthetic/*.txt labeled_data/txt
+```
+
 In order to reduce size, zip the contents of labeled_data:
 ``` shell
-zip -r labeled_data.zip labeled_data
+zip -r danbooru-aesthetic.zip danbooru-aesthetic
 ```
-This will package the entire labaled_data folder into a zip file. This command DOES NOT output any information in the terminal, so be patient.
+This will package the entire danbooru-aesthetic folder into a zip file. This command DOES NOT output any information in the terminal, so be patient.

 ## Finish
-You can now continue to Configure
+You can now continue to Configure