Yahoo Search Index Now Supported By Open-Source Hadoop Architecture
As the Yahoo Search Blog explains, open-source Apache Hadoop is now at the center of Yahoo’s search index:
We are now using Hadoop to process the Webmap — the application which produces the index from the billions of pages crawled by Yahoo! Search … Our implementation of a Hadoop-based Webmap is part of a larger strategy of Yahoo! moving toward openness — both in our infrastructure and throughout the network…
There are more technical details here. Hadoop takes over from a proprietary system being used previously. The benefits, among others, are cost savings and scalability.
The irony of this development, however, is that it comes just before Microsoft may take over Yahoo. Microsoft is all about proprietary technology, which is the opposite of what’s going on here. There’s an interview between Jeremy Zawodny and two of the engineers that worked on the project in the video below: