Yahoo Search Index Now Supported By Open-Source Hadoop Architecture

As the Yahoo Search Blog explains, open-source Apache Hadoop is now at the center of Yahoo’s search index: We are now using Hadoop to process the Webmap — the application which produces the index from the billions of pages crawled by Yahoo! Search … Our implementation of a Hadoop-based Webmap is part of a larger […]

Chat with SearchBot

As the Yahoo Search Blog explains, open-source Apache Hadoop is now at the center of Yahoo’s search index:

We are now using Hadoop to process the Webmap — the application which produces the index from the billions of pages crawled by Yahoo! Search … Our implementation of a Hadoop-based Webmap is part of a larger strategy of Yahoo! moving toward openness — both in our infrastructure and throughout the network…


There are more technical details here. Hadoop takes over from a proprietary system being used previously. The benefits, among others, are cost savings and scalability.

The irony of this development, however, is that it comes just before Microsoft may take over Yahoo. Microsoft is all about proprietary technology, which is the opposite of what’s going on here. There’s an interview between Jeremy Zawodny and two of the engineers that worked on the project in the video below:


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Greg Sterling
Contributor
Greg Sterling is a Contributing Editor to Search Engine Land, a member of the programming team for SMX events and the VP, Market Insights at Uberall.

Get the must-read newsletter for search marketers.