Uncharted: Big Data as a Lens on Human Culture by Erez Aiden & Jean-Baptiste Michel

Through the Looking Glass

  • At the heart of this book is a tool called the Ngram Viewer that Erez and Jean-Baptiste created to analyze the contents of the 30 million books that have been digitized by Google so far. This allows for the analysis of the past as these books cover all of recorded history. They see this tool as analogous to other breakthrough tools like the telescope and the microscope as it can uncover historical content that would otherwise be unavailable. Think of it as looking at our human past through a digital lens. You can try this tool yourself at Google’s Books Site.
  • They lead with a graph that shows the number of times “the United States are” and “the United States is” show up in all 30 million books over time. In other words, when did we start to see the US as a single country as opposed to a group of loosely connected states? While most historians thought the answer was right after the civil war, the answer turns out to be closer to 1880. This serves as an example of the kind of results the Ngram Viewer can produce.

Linguistic Fossil Hunters

  • Thanks to work done by G.K. Zipf at Harvard in the 1930′ and 1940’s, scientists know that many phenomena follow a power law rather than a normal distribution. While human heights are normally distributed, things like city sizes, word frequencies, and followers on Twitter do not. Here is a link to a site that explains more about the power law.
  • Children are a great source of ideas for scientists as they ask questions that are superficially simple and easy to understand, but are often profound. Think “where does the sun go at night?” At the heart of this chapter is the frequency of regular and irregular verbs and how they may change over time. The authors have discovered that the most frequent verbs are irregular and that it is their frequency that has prevented them from becoming regular. While you may think that the concordance (a list of the frequency of words in a book) is obsolete due to the power of computer search, the search engines themselves are nothing more than digital concordances.
Share this:
Share this page via Email Share this page via Stumble Upon Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this page via Google Plus
DrDougGreen.com     If you like the summary, buy the book
Pages: 1 2 3 4