Google announced earlier today that version 2.0 of the popular Google Books Ngram Viewer is now available online. What’s an Ngram Viewer? In a nutshell, Ngram Viewer lets you find and visualize how words and phrases have developed and been used over time using the 30 million print books Google has scanned working with libraries located around the world as its dataset.
Ngram Viewer was developed as a research tool for linguists, lexicographers, historians and others but has proven to be popular tool for others. Google says that more than 45 million word comparison graphs have been created in Ngram Viewer’s first 22 months.
In a Google Research Blog Post, Google Engineering Manager and Ngram Viewer co-creator, John Orwant, says that version 2.0 is using a new dataset with material from more books.
Orwant adds that along with more data, the optical character recognition (OCR) that Google uses when scanning books is better, and Google has also made improvements in how it deals with the metadata provided by both publisher and library partners.
The quality of Google’s scanning and metadata has been under scrutiny since the beginning of the project.
We covered some of the initial problems with Ngram Viewer when it launched in “When OCR Goes Bad: Google’s Ngram Viewer & The F-Word.” Note: Adult language used in the article and demo searches.
As an example, the “medial S” appears to still be causing inaccurate results.
Here’s the current version of a search used in the story where you’ll see some of the same issues raised back in 2010.
Of course no scanning method, metadata source or database are 100% perfect, but that doesn’t mean you shouldn’t take advantage of what Ngram Viewer offers. Our only advice, as is the case with any database or reference resource, is to review and question what you find.
Ngram Version 2.0 also can now automatically automatically identify parts of speech and compare how a word is used. For example, how the word “cheer” is used as a verb and noun over time:
With the new version, you can also now add, subtract, multiply and divide Ngram counts. For instance, you can see how “record player” rose as the popularity of “Victrola” declined:
You can learn more about how Ngram Viewer works on this info page.
With a bit of understanding of what Ngram Viewer can and can’t do, because of its size, it’s a unique resource that can be both educational, informative and even fun for just about anyone who is interested in the history of how language evolves.