Google and the historian

Dan Cohen gave an interesting talk at the American Historical Association meeting recently, where he discussed the benefits Google brings to historical research, as well as some pointed criticisms.

By Kristopher A. Nelson in

Compare Google to other companies, like ProQuest or Elsevier. These two (among other companies) charge “exorbitant” fees to libraries for access to research materials. I think anyone who has ever worked in a library would agree that the costs of access are frustrating and increasingly impossible, and take a larger and larger chunk of library resources, even as library budgets are shrinking. Negotiating with them is an ongoing challenge, and the tools they provide — while powerful — are nowhere near the level modern technologies should allow. Contrast this with Google, which “has given us Google Scholar, Google Books, newspaper archives, and more, often besting commercial offerings while being freely accessible.”

Google Books has revolutionized the way many students and professors approach historical research. The size of one’s local library is no longer a limitation to the kind of research work one can do. I am no longer dependent exclusively interlibrary loan to get access to books my university lacks. Even if I eventually I want to actual, physical book, with Google Books I can see if it will be useful before I waste the time  (or the very limited funds I have currently to buy it myself).

Cohen also points out, however, that for all the utility of the service, Google “remains strangely closed when it comes to Google Books.” Cohen writes, “The real problem — especially for those in the digital humanities but increasingly for many others — is that Google Books is only open in the read-a-book-in-my-pajamas way.” Google has chosen not to maximize access to public-domain books, or abandoned books. To do so would potentially revolutionize the entire sphere of intellectual property and the publishing industry — the kind of revolution Google is famous for in other spheres, but which it has not chosen to push now. The current settlement may indeed be problematic, but it is not revolutionary. Cohen notes:

We should remember that the reason we are in a settlement now is that Google didn’t have enough chutzpah to take the higher, tougher road — a direct challenge in the courts, the court of public opinion, or the Congress to the intellectual property regime that governs many books and makes them difficult to bring online, even though their authors and publishers are long gone. While Google regularly uses its power to alter markets radically, it has been uncharacteristically meek in attacking head-on this intellectual property tower and its powerful corporate defenders. Had Google taken a stronger stance, historians would have likely been fully behind their efforts, since we too face the annoyances that unbalanced copyright law places on our pedagogical and scholarly use of textual, visual, audio, and video evidence.

Much as I would have liked to see the IP regime change and to see Google leading the effort, perhaps such an attempt is unrealistic. Google understands Web data. It’s engineers understand electronic sources, hyperlinks, software, and PDFs. Their approaches and algorithms have revolutionized Web searching. But the people at Google have less of an understanding of the kind of research and writing done in the humanities, the books historians write, and the articles and research we produce. Cohen writes:

Because Google Books is the product of engineers, with tremendous talent in computer science but less sense of the history of the book or the book as an object rather than bits, it founders in many respects. Google still has no decent sense of how to rank search results in humanities corpora. Bibliometrics and text mining work poorly on these sources (as opposed to, say, the highly structured scientific papers Google Scholar specializes in). Studying how professional historians rank and sort primary and secondary sources might tell Google a lot, which it could use in turn to help scholars.

Google has managed to move into new areas before, from search to building hardware and software (the Nexus One), for example. Why couldn’t they learn from the humanities and not just from other engineers? Advertising, after all, is already a combination of engineering, humanities, and business — so why couldn’t Google developers learn from history scholars to improve their search algorithms for Google Scholar and Google Books?