some final fixes
This commit is contained in:
parent
b6f52310a4
commit
1abc01ef2f
16
README.md
16
README.md
|
@ -2,20 +2,28 @@
|
|||
|
||||
_Ultra-high quality PDFs from VitalSource._
|
||||
|
||||
This is an automated, all-in-one scraper to convert VitalSource textbooks into PDFs. Features include:
|
||||
This is an automated, all-in-one scraper to convert VitalSource textbooks into PDFs with no compromises. Features include:
|
||||
|
||||
- Automated download of pages.
|
||||
- Automated OCR.
|
||||
- Correct page numbering (including Roman numerals at the beginning).
|
||||
- Table of contents creation.
|
||||
- No funny stuff. No weird endpoints are used and no hacky scraping is preformed.
|
||||
- Almost completly transparent. All actions are ones that a normal user would do.
|
||||
- No funny stuff. No weird endpoints and no hacky scraping.
|
||||
- Almost completely transparent. All actions are ones that a normal user would do.
|
||||
|
||||
The goal of this project is for this to "just work." There are many other VitalSource scrapers out there that are weird, poorly
|
||||
The goal of this project is for it to "just work." There are many other VitalSource scrapers out there that are weird, poorly
|
||||
designed, or broken. I designed my scraper to be simple while producing the highest-quality PDF possible.
|
||||
|
||||
**This only works with PDF books!** The URL must look something like this: https://bookshelf.vitalsource.com/reader/books/{isbn}/pageid/{page_id}
|
||||
|
||||
**This URL format won't work!** https://bookshelf.vitalsource.com/reader/books/{isbn}/epubcfi/6/22[%3Bvnd.vst.idref%3Dt{author}{isbn}c00_02]!/4
|
||||
|
||||
Maybe someday the scraper could be updated to work with more book formats...
|
||||
|
||||
## Install
|
||||
|
||||
This program only works on Linux. You can use WSL on Windows.
|
||||
|
||||
```bash
|
||||
sudo apt install ocrmypdf jbig2dec
|
||||
pip install -r requirements.txt
|
||||
|
|
|
@ -145,7 +145,8 @@ if not args.skip_scrape or args.only_scrape_metadata:
|
|||
if not args.only_scrape_metadata:
|
||||
_, total_pages = get_num_pages()
|
||||
|
||||
print('You specified a start page so ignore the very large page count.')
|
||||
if args.start_page > 0:
|
||||
print('You specified a start page so ignore the very large page count.')
|
||||
total_pages = 99999999999999999 if args.start_page > 0 else total_pages
|
||||
|
||||
print('Total number of pages:', total_pages)
|
||||
|
|
Loading…
Reference in New Issue