Google’s New Indexing Infrastructure “Caffeine” Now Live

Google first mentioned their new indexing infrastructure, Caffeine, back in August 2009 in order to solicit feedback, then launched it at one data center in November.  Finally, it’s live everywhere. The Google blog calls it a “whole new web indexing system” that’s “more than 50 percent fresher than our last index and it’s the largest collection of web content we’ve offered”.

So what is Caffeine and what does its launch mean for searchers and content owners?

Maile Ohye, of Google’s Webmaster Central told me “the entire web is expanding and evolving and Caffeine means that we can better evolve with it. As the ecosystem improves, we improve too and return more relevant content to searchers.” Google’s Matt Cutts added that “Caffeine benefits both searchers and content owners because it means that all content (and not just content deemed “real time”) can be searchable within seconds after its crawled.”

Caffeine is a revamp of Google’s indexing infrastructure. It is not a change to Google’s ranking algorithms.  It is live across all data centers, regions, and languages.

Content is available to searchers more quickly

Previously, Google’s crawling and indexing systems worked as batch processes. Googlebot would crawl a set of pages, then process those pages (extracting content from them, associating data about them, such as anchor text and external links, determining what those pages were about), and finally add them to the index. While this system was continuous, all the documents in the batch had to wait until the whole batch was processed to be pushed live. Now, when Google crawls a page, it processes that page through the entire indexing pipeline and pushes it live nearly instantly. This change has already resulted in a 50 percent fresher index than before.

Note that the introduction of Caffeine doesn’t necessarily mean that pages will be crawled on a faster schedule than before. It simply means that once those pages are crawled, they are made available to searchers much more quickly. (Remember, you can estimate how often your pages are crawled by taking a look at your server logs or checking the cache dates in Google.)

Google’s storage capacity has greatly increased

While Google’s index is not significantly larger than before at the moment, the new indexing infrastructure makes that possible. Which only makes sense. If Caffeine is intended to help Google better evolve as the web does, then it needs significant storage capacity. The web grows by leaps and bounds every day, certainly much faster than anyone could have imagined back when Google first launched.

Google’s flexibility in storing information about documents has greatly increased

Google has always associated a variety of details with documents it stores. (In this context, a “document” refers to any piece of web content, such as a web page, image, or video.) For instance, when Google indexes a web page, it also stores information about what external pages link to that page and what anchor text is used in those links. The Caffeine infrastructure provides more flexibility in the type of details that can be stored with a document. As the web changes and new valuable data about web content emerges, Google won’t have to build new code to take advantage of it. This means that while Caffeine itself is not a ranking algorithm change, it could impact ranking in the future (as new signals are associated with pages).

Matt Cutts told me “It’s important to realize that caffeine is only a change in our indexing architecture. What’s exciting about Caffeine though is that it allows easier annotation of the information stored with documents, and subsequently can unlock the potential of better ranking in the future with those additional signals.”

Update: In Matt’s keynote at SMX Advanced, he gave an example of additional data that Google can now store for documents. He said, “you might imagine that before we could associate a page with only one country, whereas now, we could potentially associate that page with several countries”. (Note that he wasn’t saying this was something that Google does now; just that it was an example of what is possible with the new infrastructure.)

How can content owners best take advantage of the new infrastructure?

Content owners will reap the benefits of Caffeine without doing anything at all. In fact, there’s really not much, if anything content owners can do. Some may wonder if this change means that existing best practices around crawl efficiency matter more than before. Is page speed, which Google has focused on more lately, more important? Nope. Google told me that this change doesn’t make any of the crawling, indexing, or ranking factors more or less important than before. It simply makes crawled content available in search results more quickly before and paves the way for added flexibility in taking advantage of the whatever may come as the web evolves.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEO | Google: SEO | Google: Web Search | SEO - Search Engine Optimization | Top News

Sponsored


About The Author: is a Contributing Editor at Search Engine Land. She built Google Webmaster Central and went on to found software and consulting company Nine By Blue and create Blueprint Search Analytics< which she later sold. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a foundation for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://cmil.in cmildotin

    I just wrote a post about a speed test I did last night, posted it a few hours ago… looked at Twitter and saw the Caffeine news. With a correctly set up set of social media profiles, you can get pretty close to instant content indexing already… but it would be nice to not have to “utilize” social media to make that happen.

    I’d love to see a day where everyone – including marketers – can focus 99% of their efforts on content.

  • http://www.lexolutionit.com maneetpuri

    Though this doesnt have any direct SEO advantage for content publishers, its definitely good news to know that my content would be available to people immediately after publication. Sooner the better!

  • http://www.paygseo.co.uk James Hunt

    serps and indexation are, imo, interlinked. To say that this change wont impact on ranking doesnt make sense.

    Caffeine makes ‘new’ content almost instantly available, right? So that new content has to rank somewhere. If QDF states that fresh content ranks better, does this change in indexation mean that blog posts will rank higher more quickly than they did previously.

    Perhaps I am completely missing the point here, but for me, the serps and the index are inexplicitly linked and a change in one has effects on the other

  • http://ninebyblue.com/ Vanessa Fox

    Hi James,

    To clarify, Caffeine itself is not a ranking algorithm change. It’s an indexing infrastructure change. That doesn’t mean that it won’t impact rankings.

  • http://www.smoz.info Eric

    in my opinion, to index faster means greater possiblity of ranking higher – you don’t get index you don’t rank.

    so, google wants us to provide more “fresh” content.

  • webco

    Even though Caffeine is clearly an indexing algorithm change, it has potential to affect rankings (possibly in the not too distant future) as a result of it’s ability to associate different types of data to documents.

    And right now, it seems to me that using techniques to improve your crawl rate – i.e. increase the frequency and depth of Goooglebot crawls to your site – particularly to New content, couldbe a pretty effective way to raise a sites online profile.

    Perhaps “Crawl Rate Optimisation” could even become a whole new type of SEO activity.

  • http://ninebyblue.com/ Vanessa Fox

    Making Googlebot’s crawl as efficient as possible on a site isn’t a new type of SEO activity. It should be a fundamental part of any SEO efforts.

  • http://twitter.com/MichelleObama7 Clifford Bryan

    I’ve noticed a difference in the last two days but for some reason I don’t think it is all the way complete. Twitter propagation is still very slow for instance. And the serps are different on different browsers

  • http://www.webconsulting.com.au webco

    Thanks Vanessa, I think I worded that a little clumsily – Clearly making a site search engine friendly and easily crawlable is s critical part of SEO, but I guess that Caffeine may provide an incentive to put a bit more effort into activities that encourage Googlebot,

  • http://www.interactionmedia.co.uk/ SergePon

    “Caffeine” benefits sounds very attractive and main point is that time after bot indexed site and this data is available for search users is going significantly decrease, but all this is theory, will see how it will works in real life.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide