Making Sense Of All The Data: Google, Hadoop & Cloudera

The article, Hadoop, a Free Software Program, Finds Uses Beyond Search, explains the very interesting history behind Hadoop. What is Hadoop? It’s distributed computing software that enables data mining and analysis on a huge scale. It also, apparently, is an open-source version of proprietary software developed by Google to process and analyze massive volumes of data for search. Here’s how the NY Times explains the problem Google was addressing:

By 2003, Google found it increasingly difficult to ingest and index the entire Internet on a regular basis. Adding to these woes, Google lacked a relatively easy to use means of analyzing its vast stores of information to figure out the quality of search results and how people behaved across its numerous online services.

To address those issues, a pair of Google engineers invented a technology called MapReduce that, when paired with the intricate file management technology the company uses to index and catalog the Web, solved the problem.

The MapReduce technology makes it possible to break large sets of data into little chunks, spread that information across thousands of computers, ask the computers questions and receive cohesive answers. Google rewrote its entire search index system to take advantage of MapReduce’s ability to analyze all of this information and its ability to keep complex jobs working even when lots of computers die.

MapReduce represented a couple of breakthroughs. The technology has allowed Google’s search software to run faster on cheaper, less-reliable computers, which means lower capital costs. In addition, it makes manipulating the data Google collects so much easier that more engineers can hunt for secrets about how people use the company’s technology instead of worrying about keeping computers up and running.

Hadoop was developed as something of an open-source response to MapReduce by Doug Cutting, who was later hired by Yahoo. Yahoo then spent millions, according to the article, to further develop Hadoop. Other internet giants and companies such as Facebook, IBM, Microsoft and Autodesk, have used Hadoop extensively to analyze huge volumes of data in ways that extend far beyond search.

Now former employees of Google, Yahoo and Facebook have come together to launch Cloudera to deliver data analysis around Hadoop:

“What if Google decided to sell the ability to do amazing things with data instead of selling advertising?” Mr. Hammerbacher asked.

The company has just released its own version of Hadoop. The software remains free, but Cloudera hopes to make money selling support and consulting services for the software. It has only a few customers, but it wants to attract biotech, oil and gas, retail and insurance customers to the idea of making more out of their information for less.

This is data mining on a gigantic scale, taking Google’s original techniques, as translated by Hadoop, and seeking to bring them the masses (of enterprises that is).

Related Topics: Channel: SEO | Google: Other | Google: Web History & Search History | Google: Web Search | Yahoo: General | Yahoo: Other


About The Author: is a Contributing Editor at Search Engine Land. He writes a personal blog Screenwerk, about SoLoMo issues and connecting the dots between online and offline. He also posts at Internet2Go, which is focused on the mobile Internet. Follow him @gsterling.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide