Home » Discussion Forum

Discussion Forum—A Way with Words, a fun radio show and podcast about language

Discussion Forum (Archived)

Please consider registering
Guest
Forum Scope


Match



Forum Options



Min search length: 3 characters / Max search length: 84 characters
The forums are currently locked and only available for read only access
sp_TopicIcon
What it looks like to process 3.5 million books in Google’s cloud
Guest
1
2016/04/07 - 3:59pm

Moderately technical article about books and Big Data.

https://cloudplatform.googleblog.com/2016/02/what-it-looks-like-to-process-3.5-million-books-in-Googles-cloud.html
This past September I published into Google BigQuery a massive new public dataset of metadata from 3.5 million digitized English-language books dating back more than two centuries (1800-2015), along with the full text of 1 million of these books. The archive, which draws from the English-language public domain book collections of the Internet Archive and HathiTrust, includes full publication details for every book, along with a wide array of computed content-based data. The entire archive is available as two public BigQuery datasets, and there’s a growing collection of sample queries to help users get started with the collection. You can even map two centuries of books with a single line of SQL.

Guest
2
2016/04/09 - 1:48am

Can't hold seeing the immoderate version!

I suppose it requires subscription to query your data, no?  Is it meant for uses by the general public?

Guest
3
2016/04/09 - 6:52am

As far as I can tell, access to the author's data is free and open to pubkic use.

Forum Timezone: America/Los_Angeles
Show Stats
Administrators:
Martha Barnette
Grant Barrett
Moderators:
Grant Barrett
Top Posters:
Newest Members:
Mike Brock
Forum Stats:
Groups: 1
Forums: 1
Topics: 3647
Posts: 18912

 

Member Stats:
Guest Posters: 618
Members: 1266
Moderators: 1
Admins: 2
Most Users Ever Online: 1147
Currently Online:
Guest(s) 38
Currently Browsing this Page:
1 Guest(s)

Recent posts