• Search Engine Land
  • Sections
    • SEO
    • SEM
    • Local
    • Retail
    • Google
    • Bing
    • Social
    • Resources
    • More
    • Home
  • Follow Us
    • Follow
  • Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • Live
  • More
  • Events
    • Follow
  • SUBSCRIBE

Search Engine Land

Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • More
  • Newsletters
  • Home
SEO

Google’s New Indexing Infrastructure ‘Caffeine’ Now Live

What you need to know about Google's indexing system.

Vanessa Fox on June 8, 2010 at 8:05 pm
  • More

Google first mentioned its new indexing infrastructure, Caffeine, in August 2009 in order to solicit feedback. Then launched it at one data center in November.  Finally, Caffeine is live everywhere. The Google blog calls it a “whole new web indexing system” that’s “more than 50 percent fresher than our last index and it’s the largest collection of web content we’ve offered”.

What is Google Caffeine and what does it mean for searchers and content owners?

Maile Ohye, of Google’s Webmaster Central told me “the entire web is expanding and evolving and Caffeine means that we can better evolve with it. As the ecosystem improves, we improve too and return more relevant content to searchers.”

Google’s Matt Cutts added that “Caffeine benefits both searchers and content owners because it means that all content (and not just content deemed “real time”) can be searchable within seconds after its crawled.”

Caffeine is a revamp of Google’s indexing infrastructure. It is not a change to Google’s ranking algorithms.  

It is now live across all data centers, regions, and languages.

Caffeine makes content available to searchers faster

Previously, Google’s crawling and indexing systems worked as batch processes.

Googlebot would crawl a set of pages, then process those pages (extracting content from them, associating data about them, such as anchor text and external links, determining what those pages were about), and finally add them to the index.

While this system was continuous, all the documents in the batch had to wait until the whole thing was processed to be pushed live. Now, when Google crawls a page, it processes that page through the entire indexing pipeline and pushes it live nearly instantly.

Caffeine has already resulted in a 50 percent fresher index than before, says Google

Note that the introduction of Caffeine doesn’t necessarily mean that pages will be crawled on a faster schedule than before. It simply means that once those pages are crawled, they are made available to searchers much more quickly.

Google’s storage capacity has greatly increased

While Google’s index is not significantly larger than before, the new indexing infrastructure makes that possible. Which only makes sense. If Caffeine is intended to help Google better evolve as the web does, then it needs significant storage capacity.

The web grows by leaps and bounds every day, certainly much faster than anyone could have imagined back when Google first launched.

Google’s flexibility in storing information about documents has greatly increased

Google has always associated a variety of details with documents it stores. (In this context, a “document” refers to any piece of web content, such as a web page, image, or video.) For instance, when Google indexes a web page, it also stores information about what external pages link to that page and what anchor text is used in those links.

The Caffeine infrastructure provides more flexibility in the type of details that can be stored with a document. As the web changes and new valuable data about web content emerges, Google won’t have to build new code to take advantage of it. This means that while Caffeine itself is not a ranking algorithm change, it could impact ranking in the future (as new signals are associated with pages).

Matt Cutts told me, “It’s important to realize that caffeine is only a change in our indexing architecture. What’s exciting about Caffeine though is that it allows easier annotation of the information stored with documents, and subsequently can unlock the potential of better ranking in the future with those additional signals.” (Bolding added.)

In Matt’s keynote at SMX Advanced, he gave an example of additional data that Google can now store for documents. He said, “You might imagine that before we could associate a page with only one country, whereas now, we could potentially associate that page with several countries”. (Note that he wasn’t saying this was something that Google does now; just that it was an example of what is possible with the new infrastructure.)

How can content owners best take advantage of Caffeine’s infrastructure?

Content owners will reap the benefits of Caffeine without doing anything at all. In fact, there’s really not much, if anything content owners can do. Some may wonder if this change means that existing best practices around crawl efficiency matter more than before.

Is page speed, which Google has focused on more lately, more important? Nope. Google told me that this change doesn’t make any of the crawling, indexing, or ranking factors more or less important than before. It simply makes crawled content available in search results more quickly before and paves the way for added flexibility in taking advantage of the whatever may come as the web evolves.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.



About The Author

Vanessa Fox
Vanessa Fox is a Contributing Editor at Search Engine Land. She built Google Webmaster Central and went on to found software and consulting company Nine By Blue and create Blueprint Search Analytics< which she later sold. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a foundation for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Related Topics

Channel: SEOGoogleGoogle: Algorithm UpdatesGoogle: SEOGoogle: Web SearchSEO - Search Engine OptimizationSEO: Crawling and Robots

We're listening.

Have something to say about this article? Share it with us on Facebook, Twitter or our LinkedIn Group.

Get the daily newsletter search marketers rely on.
See terms.

ATTEND OUR EVENTS

Lorem ipsum doler this is promo text about SMX events.

February 23, 2021: SMX Report

April 13, 2021: SMX Create

May 18-19, 2021: SMX London

June 8-9, 2021: SMX Paris

June 15-16, 2021: SMX Advanced

August 17, 2021: SMX Convert

November 9-10, 2021: SMX Next

October 2021: SMX Advanced Europe

December 17, 2021: SMX Code

Available On-Demand: SMX

×


Learn More About Our SMX Events

Discover actionable tactics that can help you overcome crucial marketing challenges. Our next conference will be held:

MarTech 2021: March 16-17

MarTech 2021: Sept. 14-15

MarTech 2020: Watch On-Demand

×

Attend MarTech - Click Here


Learn More About Our MarTech Events

White Papers

  • The State of Local Marketing Report 2020-2021
  • Quality CRM Data: The Key to Delivering Great Customer Experiences
  • How the Microsoft Search Network Can Maximize Your Search Campaigns
  • The Marketer’s Playbook for Customer Acquisition
  • How To Optimize SEO With UGC
See More Whitepapers

Webinars

  • How to Avoid the Digital Transformation Trap
  • How to Build a Marketing System of Record
  • Meet BIMI: The brand-boosting email security marketers must have for 2021
See More Webinars

Research Reports

  • Local Marketing Solutions for Multi-Location Businesses
  • Enterprise Digital Asset Management Platforms
  • Identity Resolution Platforms
  • Customer Data Platforms
  • B2B Marketing Automation Platforms
  • Call Analytics Platforms
See More Research

h
Receive daily search news and analysis.
Search Engine Land
Download the Search Engine Land App on iTunes Download the Search Engine Land App on Google Play

Channels

  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social

Our Events

  • SMX
  • MarTech

Resources

  • White Papers
  • Research
  • Webinars
  • Search Marketing Expo
  • MarTech Conference

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff
  • Connect With Us

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • Instagram
  • RSS
  • Youtube
  • iOS App
  • Google Play

© 2021 Third Door Media, Inc. All rights reserved.

Your privacy means the world to us. We share your personal information only when you give us explicit permission to do so, and confirm we have your permission each time. Learn more by viewing our privacy policy.Ok