Message Boards Message Boards

Script to download nested PDF books from a WEB Page

Posted 4 years ago

This is a WEB Scraping & WEB Crawler function.

This short function downloads all the PDF book files and saves locally, from the main WEB page, one file at a time, the PDFs are at second level WEB pages for each book:

getBooks[bookUrl_]:=
 URLDownload[
  Select[Flatten[
    Import[#, "Hyperlinks"] & /@ 
     Import[bookURL, "Hyperlinks"]], 
   StringContainsQ[#, ".pdf"] &], 
  "~/Downloads/Books", 
  CreateIntermediateDirectories -> True];


getBooks["https://books.goalkicker.com/"]
POSTED BY: Daniel Carvalho
3 Replies

The 1st line should be:

getBooks[bookURL_String] :=

just a copy/paste error.

POSTED BY: Gustavo Delfino

It is funny because it is getBooks[bookUrl_]:= at edit mode... gets this way when published...

POSTED BY: Daniel Carvalho

Better now!!

POSTED BY: Daniel Carvalho
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract