Yahoo Search Index Now Supported By Open-Source Hadoop Architecture
As the Yahoo Search Blog explains, open-source Apache Hadoop is now at the center of Yahoo’s search index: We are now using Hadoop to process the Webmap — the application which produces the index from the billions of pages crawled by Yahoo! Search … Our implementation of a Hadoop-based Webmap is part of a larger […]
As the Yahoo Search Blog explains, open-source Apache Hadoop is now at the center of Yahoo’s search index:
We are now using Hadoop to process the Webmap — the application which produces the index from the billions of pages crawled by Yahoo! Search … Our implementation of a Hadoop-based Webmap is part of a larger strategy of Yahoo! moving toward openness — both in our infrastructure and throughout the network…
There are more technical details here. Hadoop takes over from a proprietary system being used previously. The benefits, among others, are cost savings and scalability.
The irony of this development, however, is that it comes just before Microsoft may take over Yahoo. Microsoft is all about proprietary technology, which is the opposite of what’s going on here. There’s an interview between Jeremy Zawodny and two of the engineers that worked on the project in the video below:
Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.
Related stories
New on Search Engine Land