The Godfather talking
You can run, but you can't hide.
Sonsivri
 
*
Welcome, Guest. Please login or register.
Did you miss your activation email?
April 27, 2017, 05:30:22 05:30


Login with username, password and session length


Pages: [1]
Print
Author Topic: Anyone know how to mine the Pop Sci PDFs here?  (Read 1340 times)
0 Members and 1 Guest are viewing this topic.
solutions
Hero Member
*****
Offline Offline

Posts: 1750

Thank You
-Given: 602
-Receive: 861



« on: October 25, 2013, 05:00:33 05:00 »

"In 2009, Popular Science worked with Google to digitize the magazine's archives back to its inception in 1872, transforming 1,563 issues into mineable data"

http://www.popsci.com/archives

Does anyone know how to grab these issues as a set of PDFs? The collector/packrat in me wants to be able to peruse these, especially the really old stuff, instead of just searching for terms which is the way they present it.

thanks
Logged
CocaCola
Senior Member
****
Offline Offline

Posts: 443

Thank You
-Given: 116
-Receive: 203


« Reply #1 on: October 25, 2013, 08:12:18 08:12 »

OK this is hardly automated but, hey if you want them sometimes you need to do a little work Smiley

Go here and download/install this http://www.gbooksdownloader.com/

***NOTE the above program tries to install a bunch of other software at the end of the install package, DECLINE all that additional stuff***  PAY ATTENTION TO THE INSTALL!

Next go to this link http://books.google.com/books/serial/ISSN:01617370?rview=0&lr&sa=N&start=1

That will bring up the most current issues from 2009, and what not...

Right click on each issue and copy URL, paste them into the program you downloaded above, make sure to bump the resolution to the max in the capture software before you save, and also PDF if you don't want a bunch of individual images of the pages...

You can then browse to more issued just like any Google search using the page numbers at the bottom...

BUT DO NOTE, that the Google search page numbers only let you browse to page 100 aka May 1924, from that point you will need to fake the page number...

Page search 100 that gets you to May 1924 url is

Code:
http://books.google.com/books/serial/ISSN:01617370?rview=0&lr=&sa=N&start=990

You need to change the last number up by 10 to obviously get the next 10 issues so change it to

Code:
http://books.google.com/books/serial/ISSN:01617370?rview=0&lr=&sa=N&start=1000

And so on until you get all 1563 issues...

Now, this could be automated with a script, but it really won't take that long to do it manually, ok it will take some time but hardly that much...  Also beware that if a script is written it should only download a single page at a time, if Google detects the same IP downloading multiple pages at that same time they will ban the IP as a bot harvester...

Good luck, maybe it can be a joint effort where a few people volunteer to download a decade and share it with each other?
Logged
Pages: [1]
Print
Jump to:  


DISCLAIMER
WE DONT HOST ANY ILLEGAL FILES ON THE SERVER
USE CONTACT US TO REPORT ILLEGAL FILES
ADMINISTRATORS CANNOT BE HELD RESPONSIBLE FOR USERS POSTS AND LINKS

... Copyright 2003-2999 Sonsivri.to ...
Powered by SMF 1.1.18 | SMF © 2006-2009, Simple Machines LLC | HarzeM Dilber MC