Google’s Updates Ngram Viewer, Showing How Words Have Evolved Over time

Google announced earlier today that version 2.0 of the popular Google Books Ngram Viewer is now available online. What’s an Ngram Viewer? In a nutshell, Ngram Viewer lets you find and visualize how words and phrases have developed and been used over time using the 30 million print books Google has scanned working with libraries located around […]

Chat with SearchBot

Google Ngram Viewer222 Logo

Google announced earlier today that version 2.0 of the popular Google Books Ngram Viewer is now available online. What’s an Ngram Viewer? In a nutshell, Ngram Viewer lets you find and visualize how words and phrases have developed and been used over time using the 30 million print books Google has scanned working with libraries located around the world as its dataset.

Google Ngram Viewer Tv Radio

The service debuted in December, 2010 at the time this research paper was published in Science.

Ngram Viewer was developed as a research tool for linguists, lexicographers, historians and others but has proven to be popular tool for others. Google says that more than 45 million word comparison graphs have been created in Ngram Viewer’s first 22 months.

In a Google Research Blog Post, Google Engineering Manager and Ngram Viewer co-creator, John Orwant, says that version 2.0 is using a
new dataset with material from more books.

Orwant adds that along with more data, the optical character recognition (OCR) that Google uses when scanning books is better, and Google has also made improvements in how it deals with the metadata provided by both publisher and library partners.

The quality of Google’s scanning and metadata has been under scrutiny since the beginning of the project.

We covered some of the initial problems with Ngram Viewer when it launched in “When OCR Goes Bad: Google’s Ngram Viewer & The F-Word.”
Note: Adult language used in the article and demo searches. 

As an example, the “medial S” appears to still be causing inaccurate results.

Here’s the current version of a search used in the story where you’ll see some of the same issues raised back in 2010.

Of course no scanning method, metadata source or database are 100% perfect, but that doesn’t mean you shouldn’t take advantage of what Ngram Viewer offers. Our only advice, as is the case with any database or reference resource, is to review and question what you find.

Ngram Version 2.0 also can now automatically automatically identify parts of speech and compare how a word is used. For example, how the word “cheer” is used as a verb and noun over time:

Ngram Viewer 2.0 PartsOspeech

With the new version, you can also now add, subtract, multiply and divide Ngram counts. For instance, you can see how “record player” rose as the popularity of “Victrola” declined:

Ngram Viewer 2.0 Speechadd Sub

You can learn more about how Ngram Viewer works on this info page.

With a bit of understanding of what Ngram Viewer can and can’t do, because of its size, it’s a unique resource that can be both educational, informative and even fun for just about anyone who is interested in the history of how language evolves.


Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.


About the author

Gary Price
Contributor
Gary Price is a librarian, author, and an online information analyst based in suburban Washington, DC. He is the co-founder and co-editor of INFOdocket and FullTextReports.com and prior to that was founder/editor of ResourceShelf and DocuTicker for 10 years. He has worked for Blekko, Ask.com, and at Search Engine Watch where he was news editor. In 2001, Price was the co-author (with Chris Sherman) of the best-selling book The Invisible Web.

Get the newsletter search marketers rely on.