diff --git a/README.md b/README.md index 5d1d4d7..6c548d4 100644 --- a/README.md +++ b/README.md @@ -2,20 +2,28 @@ _Ultra-high quality PDFs from VitalSource._ -This is an automated, all-in-one scraper to convert VitalSource textbooks into PDFs. Features include: +This is an automated, all-in-one scraper to convert VitalSource textbooks into PDFs with no compromises. Features include: - Automated download of pages. - Automated OCR. - Correct page numbering (including Roman numerals at the beginning). - Table of contents creation. -- No funny stuff. No weird endpoints are used and no hacky scraping is preformed. -- Almost completly transparent. All actions are ones that a normal user would do. +- No funny stuff. No weird endpoints and no hacky scraping. +- Almost completely transparent. All actions are ones that a normal user would do. -The goal of this project is for this to "just work." There are many other VitalSource scrapers out there that are weird, poorly +The goal of this project is for it to "just work." There are many other VitalSource scrapers out there that are weird, poorly designed, or broken. I designed my scraper to be simple while producing the highest-quality PDF possible. +**This only works with PDF books!** The URL must look something like this: https://bookshelf.vitalsource.com/reader/books/{isbn}/pageid/{page_id} + +**This URL format won't work!** https://bookshelf.vitalsource.com/reader/books/{isbn}/epubcfi/6/22[%3Bvnd.vst.idref%3Dt{author}{isbn}c00_02]!/4 + +Maybe someday the scraper could be updated to work with more book formats... + ## Install +This program only works on Linux. You can use WSL on Windows. + ```bash sudo apt install ocrmypdf jbig2dec pip install -r requirements.txt diff --git a/vitalsource2pdf.py b/vitalsource2pdf.py index da859ad..a98b0f4 100755 --- a/vitalsource2pdf.py +++ b/vitalsource2pdf.py @@ -145,7 +145,8 @@ if not args.skip_scrape or args.only_scrape_metadata: if not args.only_scrape_metadata: _, total_pages = get_num_pages() - print('You specified a start page so ignore the very large page count.') + if args.start_page > 0: + print('You specified a start page so ignore the very large page count.') total_pages = 99999999999999999 if args.start_page > 0 else total_pages print('Total number of pages:', total_pages)