some final fixes
This commit is contained in:
parent
b6f52310a4
commit
1abc01ef2f
16
README.md
16
README.md
|
@ -2,20 +2,28 @@
|
||||||
|
|
||||||
_Ultra-high quality PDFs from VitalSource._
|
_Ultra-high quality PDFs from VitalSource._
|
||||||
|
|
||||||
This is an automated, all-in-one scraper to convert VitalSource textbooks into PDFs. Features include:
|
This is an automated, all-in-one scraper to convert VitalSource textbooks into PDFs with no compromises. Features include:
|
||||||
|
|
||||||
- Automated download of pages.
|
- Automated download of pages.
|
||||||
- Automated OCR.
|
- Automated OCR.
|
||||||
- Correct page numbering (including Roman numerals at the beginning).
|
- Correct page numbering (including Roman numerals at the beginning).
|
||||||
- Table of contents creation.
|
- Table of contents creation.
|
||||||
- No funny stuff. No weird endpoints are used and no hacky scraping is preformed.
|
- No funny stuff. No weird endpoints and no hacky scraping.
|
||||||
- Almost completly transparent. All actions are ones that a normal user would do.
|
- Almost completely transparent. All actions are ones that a normal user would do.
|
||||||
|
|
||||||
The goal of this project is for this to "just work." There are many other VitalSource scrapers out there that are weird, poorly
|
The goal of this project is for it to "just work." There are many other VitalSource scrapers out there that are weird, poorly
|
||||||
designed, or broken. I designed my scraper to be simple while producing the highest-quality PDF possible.
|
designed, or broken. I designed my scraper to be simple while producing the highest-quality PDF possible.
|
||||||
|
|
||||||
|
**This only works with PDF books!** The URL must look something like this: https://bookshelf.vitalsource.com/reader/books/{isbn}/pageid/{page_id}
|
||||||
|
|
||||||
|
**This URL format won't work!** https://bookshelf.vitalsource.com/reader/books/{isbn}/epubcfi/6/22[%3Bvnd.vst.idref%3Dt{author}{isbn}c00_02]!/4
|
||||||
|
|
||||||
|
Maybe someday the scraper could be updated to work with more book formats...
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
|
|
||||||
|
This program only works on Linux. You can use WSL on Windows.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo apt install ocrmypdf jbig2dec
|
sudo apt install ocrmypdf jbig2dec
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
|
|
|
@ -145,7 +145,8 @@ if not args.skip_scrape or args.only_scrape_metadata:
|
||||||
if not args.only_scrape_metadata:
|
if not args.only_scrape_metadata:
|
||||||
_, total_pages = get_num_pages()
|
_, total_pages = get_num_pages()
|
||||||
|
|
||||||
print('You specified a start page so ignore the very large page count.')
|
if args.start_page > 0:
|
||||||
|
print('You specified a start page so ignore the very large page count.')
|
||||||
total_pages = 99999999999999999 if args.start_page > 0 else total_pages
|
total_pages = 99999999999999999 if args.start_page > 0 else total_pages
|
||||||
|
|
||||||
print('Total number of pages:', total_pages)
|
print('Total number of pages:', total_pages)
|
||||||
|
|
Loading…
Reference in New Issue