Google has just unveiled a “secret project” of “next-generation architecture for Google’s web search“. This new architecture appears to include crawling, indexing, and ranking changes. For the first time, Google isn’t simply incorporating these changes into their existing infrastructure or replacing it. Instead, they’re providing a developer preview and are asking webmasters and power searchers to try it out and give them feedback. Unlike Google’s now-defunct SearchMash, which was intended for search experiments that wouldn’t necessarily be incorporated into Google’s main web search, the caffeine index seems to be an entirely new search infrastructure that will repace what exists now.
Based on the blog post, we can guess that this new infrastructure may include ways of crawling the web more comprehensively, determining reputation and authority (possibly beyond the link graph and what’s typically thought of as PageRank), and returning more relevant results more quickly, although Google’s Matt Cutts told me that the changes are “primarily in how we index”.
While the biggest visible changes in Microsoft’s relaunched search engine, Bing, are user-interface related, Google’s new search is only infrastructure related and includes no UI changes. On first glance, however, the underlying infrastructure changes do seem to have impacted user interface as it relates to universal search (likely because universal results are influenced by ranking and relevance signals). For the sample searches I did, the first ten results were fairly similar, but the existence and location of images, video, news, and blog posts was notably different.
A search for [buffy the vampire slayer] on the new infrastructure, for instance, returns video and news results midway down the page.
A search on the existing infrastructure, however, returns news at the top, video in the middle, and images at the bottom of the page.
Undoubtedly, Google Caffeine will cause quite a kerfluffle in the web developer and search engine optimization world and many will dive in to try and figure out the changes. We’ll likely see many a speculative blog post about how best to optimize for the new infrastructure, but my guess is that it likely does what Google search does, but better. And the foundational elements of having a crawlable infrastructure and compelling content remain.