some final fixes

This commit is contained in:
Cyberes 2023-03-14 22:42:23 -06:00
parent b6f52310a4
commit 1abc01ef2f
2 changed files with 14 additions and 5 deletions

View File

@ -2,20 +2,28 @@
_Ultra-high quality PDFs from VitalSource._
This is an automated, all-in-one scraper to convert VitalSource textbooks into PDFs. Features include:
This is an automated, all-in-one scraper to convert VitalSource textbooks into PDFs with no compromises. Features include:
- Automated download of pages.
- Automated OCR.
- Correct page numbering (including Roman numerals at the beginning).
- Table of contents creation.
- No funny stuff. No weird endpoints are used and no hacky scraping is preformed.
- Almost completly transparent. All actions are ones that a normal user would do.
- No funny stuff. No weird endpoints and no hacky scraping.
- Almost completely transparent. All actions are ones that a normal user would do.
The goal of this project is for this to "just work." There are many other VitalSource scrapers out there that are weird, poorly
The goal of this project is for it to "just work." There are many other VitalSource scrapers out there that are weird, poorly
designed, or broken. I designed my scraper to be simple while producing the highest-quality PDF possible.
**This only works with PDF books!** The URL must look something like this: https://bookshelf.vitalsource.com/reader/books/{isbn}/pageid/{page_id}
**This URL format won't work!** https://bookshelf.vitalsource.com/reader/books/{isbn}/epubcfi/6/22[%3Bvnd.vst.idref%3Dt{author}{isbn}c00_02]!/4
Maybe someday the scraper could be updated to work with more book formats...
## Install
This program only works on Linux. You can use WSL on Windows.
```bash
sudo apt install ocrmypdf jbig2dec
pip install -r requirements.txt

View File

@ -145,7 +145,8 @@ if not args.skip_scrape or args.only_scrape_metadata:
if not args.only_scrape_metadata:
_, total_pages = get_num_pages()
print('You specified a start page so ignore the very large page count.')
if args.start_page > 0:
print('You specified a start page so ignore the very large page count.')
total_pages = 99999999999999999 if args.start_page > 0 else total_pages
print('Total number of pages:', total_pages)