From b4cc0d82790b7aa79c5afb4995e186f469e15f3a Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Wed, 14 Sep 2022 19:59:42 +0200 Subject: [PATCH 1/4] download directories --- docs/en/training/dataset.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/training/dataset.md b/docs/en/training/dataset.md index f3813b7..8a525cd 100644 --- a/docs/en/training/dataset.md +++ b/docs/en/training/dataset.md @@ -86,7 +86,7 @@ rsync rsync://176.9.41.242:873/danbooru2021/512px/000* ./512px/ ``` Download the first batch of metadata, posts000000000000.json (800MB): ``` shell -rsync rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/ +rsync -r rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/ ``` You should now have two folders named: 512px and metadata. @@ -112,4 +112,4 @@ zip -r labeled_data.zip labeled_data This will package the entire labaled_data folder into a zip file. This command DOES NOT output any information in the terminal, so be patient. ## Finish -You can now continue to Configure \ No newline at end of file +You can now continue to Configure From d6a3ca65cef6228a5bb6b4dcf6eba9d7430aa7ff Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Wed, 14 Sep 2022 20:09:08 +0200 Subject: [PATCH 2/4] fix extract command --- docs/en/training/dataset.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/training/dataset.md b/docs/en/training/dataset.md index 8a525cd..373c684 100644 --- a/docs/en/training/dataset.md +++ b/docs/en/training/dataset.md @@ -91,11 +91,11 @@ rsync -r rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json . You should now have two folders named: 512px and metadata. ## Organizing the dataset -Although we have the dataset, the metadata that explains what the image is, is inside the JSON file. In order to extract the data into individual txt files, we are going to use the script inside `` /waifu-diffusion/scripts/danbooru21_extract.py`` +Although we have the dataset, the metadata that explains what the image is, is inside the JSON file. In order to extract the data into individual txt files, we are going to use the script inside ``danbooru_data/local/extractfromjson_danboo21.py`` Assuming you are in the same directory as metadata and 512px folder: ````bash -python /waifu-diffusion/scripts/danbooru21_extract.py +python danbooru_data/local/extractfromjson_danboo21.py -J metadata/posts000000000000.json -E dataset ```` Change "/waifu-diffusion" to the path of the cloned waifu-diffusion repository. This script will also change some tags such as "1girl" to "one girl", "2boys" to "two boys", and so on. It will also add "upoaded on Danbooru". From 9113f9155af00a72eba320d87df851f6e6ff3c23 Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Wed, 14 Sep 2022 21:10:55 +0200 Subject: [PATCH 3/4] some missing info --- docs/en/training/dataset.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/en/training/dataset.md b/docs/en/training/dataset.md index 373c684..22a5d01 100644 --- a/docs/en/training/dataset.md +++ b/docs/en/training/dataset.md @@ -95,7 +95,7 @@ Although we have the dataset, the metadata that explains what the image is, is i Assuming you are in the same directory as metadata and 512px folder: ````bash -python danbooru_data/local/extractfromjson_danboo21.py -J metadata/posts000000000000.json -E dataset +python danbooru_data/local/extractfromjson_danboo21.py -J metadata/posts000000000000.json -E labeled_data ```` Change "/waifu-diffusion" to the path of the cloned waifu-diffusion repository. This script will also change some tags such as "1girl" to "one girl", "2boys" to "two boys", and so on. It will also add "upoaded on Danbooru". @@ -105,6 +105,13 @@ Once the script has finished, you should have a "labeled_data" folder, whose ins ![labeled_data-insides.png](./res/labeled_data-insides.png) ## Packaging the dataset +Next we need to put the extracted data into the format required in the section "Dataset requirements". Run the following commands: +``` shell +mkdir labeled_data/img labeled_data/txt +mv labeled_data/*.jpg labeled_data/img +mv labeled_data/*.txt labeled_data/txt +``` + In order to reduce size, zip the contents of labeled_data: ``` shell zip -r labeled_data.zip labeled_data From 40640acc8302f73a05994d91dad55aedebc0c571 Mon Sep 17 00:00:00 2001 From: Frederik Fix Date: Wed, 14 Sep 2022 21:13:48 +0200 Subject: [PATCH 4/4] use the directory hard coded --- docs/en/training/dataset.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/en/training/dataset.md b/docs/en/training/dataset.md index 22a5d01..4544f35 100644 --- a/docs/en/training/dataset.md +++ b/docs/en/training/dataset.md @@ -95,28 +95,26 @@ Although we have the dataset, the metadata that explains what the image is, is i Assuming you are in the same directory as metadata and 512px folder: ````bash -python danbooru_data/local/extractfromjson_danboo21.py -J metadata/posts000000000000.json -E labeled_data +python danbooru_data/local/extractfromjson_danboo21.py -J metadata/posts000000000000.json -E danbooru-aesthetic ```` -Change "/waifu-diffusion" to the path of the cloned waifu-diffusion repository. -This script will also change some tags such as "1girl" to "one girl", "2boys" to "two boys", and so on. It will also add "upoaded on Danbooru". -Once the script has finished, you should have a "labeled_data" folder, whose insides look like this: +Once the script has finished, you should have a "danbooru-aesthetic" folder, whose insides look like this: ![labeled_data-insides.png](./res/labeled_data-insides.png) ## Packaging the dataset Next we need to put the extracted data into the format required in the section "Dataset requirements". Run the following commands: ``` shell -mkdir labeled_data/img labeled_data/txt -mv labeled_data/*.jpg labeled_data/img -mv labeled_data/*.txt labeled_data/txt +mkdir danbooru-aesthetic/img danbooru-aesthetic/txt +mv danbooru-aesthetic/*.jpg labeled_data/img +mv danbooru-aesthetic/*.txt labeled_data/txt ``` In order to reduce size, zip the contents of labeled_data: ``` shell -zip -r labeled_data.zip labeled_data +zip -r danbooru-aesthetic.zip danbooru-aesthetic ``` -This will package the entire labaled_data folder into a zip file. This command DOES NOT output any information in the terminal, so be patient. +This will package the entire danbooru-aesthetic folder into a zip file. This command DOES NOT output any information in the terminal, so be patient. ## Finish You can now continue to Configure