Merge pull request #16 from derfred/doc-fixes

Doc fixes
This commit is contained in:
harubaru 2022-09-14 12:18:28 -07:00 committed by GitHub
commit bbcd956746
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 14 additions and 9 deletions

View File

@ -86,30 +86,35 @@ rsync rsync://176.9.41.242:873/danbooru2021/512px/000* ./512px/
``` ```
Download the first batch of metadata, posts000000000000.json (800MB): Download the first batch of metadata, posts000000000000.json (800MB):
``` shell ``` shell
rsync rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/ rsync -r rsync://176.9.41.242:873/danbooru2021/metadata/posts000000000000.json ./metadata/
``` ```
You should now have two folders named: 512px and metadata. You should now have two folders named: 512px and metadata.
## Organizing the dataset ## Organizing the dataset
Although we have the dataset, the metadata that explains what the image is, is inside the JSON file. In order to extract the data into individual txt files, we are going to use the script inside `` /waifu-diffusion/scripts/danbooru21_extract.py`` Although we have the dataset, the metadata that explains what the image is, is inside the JSON file. In order to extract the data into individual txt files, we are going to use the script inside ``danbooru_data/local/extractfromjson_danboo21.py``
Assuming you are in the same directory as metadata and 512px folder: Assuming you are in the same directory as metadata and 512px folder:
````bash ````bash
python /waifu-diffusion/scripts/danbooru21_extract.py python danbooru_data/local/extractfromjson_danboo21.py -J metadata/posts000000000000.json -E danbooru-aesthetic
```` ````
Change "/waifu-diffusion" to the path of the cloned waifu-diffusion repository.
This script will also change some tags such as "1girl" to "one girl", "2boys" to "two boys", and so on. It will also add "upoaded on Danbooru".
Once the script has finished, you should have a "labeled_data" folder, whose insides look like this: Once the script has finished, you should have a "danbooru-aesthetic" folder, whose insides look like this:
![labeled_data-insides.png](./res/labeled_data-insides.png) ![labeled_data-insides.png](./res/labeled_data-insides.png)
## Packaging the dataset ## Packaging the dataset
Next we need to put the extracted data into the format required in the section "Dataset requirements". Run the following commands:
``` shell
mkdir danbooru-aesthetic/img danbooru-aesthetic/txt
mv danbooru-aesthetic/*.jpg labeled_data/img
mv danbooru-aesthetic/*.txt labeled_data/txt
```
In order to reduce size, zip the contents of labeled_data: In order to reduce size, zip the contents of labeled_data:
``` shell ``` shell
zip -r labeled_data.zip labeled_data zip -r danbooru-aesthetic.zip danbooru-aesthetic
``` ```
This will package the entire labaled_data folder into a zip file. This command DOES NOT output any information in the terminal, so be patient. This will package the entire danbooru-aesthetic folder into a zip file. This command DOES NOT output any information in the terminal, so be patient.
## Finish ## Finish
You can now continue to Configure You can now continue to Configure