Making Sense Of All The Data: Google, Hadoop & Cloudera

The article, Hadoop, a Free Software Program, Finds Uses Beyond Search, explains the very interesting history behind Hadoop. What is Hadoop? It’s distributed computing software that enables data mining and analysis on a huge scale. It also, apparently, is an open-source version of proprietary software developed by Google to process and analyze massive volumes of data for search. Here’s how the NY Times explains the problem Google was addressing:

By 2003, Google found it increasingly difficult to ingest and index the entire Internet on a regular basis. Adding to these woes, Google lacked a relatively easy to use means of analyzing its vast stores of information to figure out the quality of search results and how people behaved across its numerous online services.

To address those issues, a pair of Google engineers invented a technology called MapReduce that, when paired with the intricate file management technology the company uses to index and catalog the Web, solved the problem.

The MapReduce technology makes it possible to break large sets of data into little chunks, spread that information across thousands of computers, ask the computers questions and receive cohesive answers. Google rewrote its entire search index system to take advantage of MapReduce’s ability to analyze all of this information and its ability to keep complex jobs working even when lots of computers die.

MapReduce represented a couple of breakthroughs. The technology has allowed Google’s search software to run faster on cheaper, less-reliable computers, which means lower capital costs. In addition, it makes manipulating the data Google collects so much easier that more engineers can hunt for secrets about how people use the company’s technology instead of worrying about keeping computers up and running.

Hadoop was developed as something of an open-source response to MapReduce by Doug Cutting, who was later hired by Yahoo. Yahoo then spent millions, according to the article, to further develop Hadoop. Other internet giants and companies such as Facebook, IBM, Microsoft and Autodesk, have used Hadoop extensively to analyze huge volumes of data in ways that extend far beyond search.

Now former employees of Google, Yahoo and Facebook have come together to launch Cloudera to deliver data analysis around Hadoop:

“What if Google decided to sell the ability to do amazing things with data instead of selling advertising?” Mr. Hammerbacher asked.

The company has just released its own version of Hadoop. The software remains free, but Cloudera hopes to make money selling support and consulting services for the software. It has only a few customers, but it wants to attract biotech, oil and gas, retail and insurance customers to the idea of making more out of their information for less.

This is data mining on a gigantic scale, taking Google’s original techniques, as translated by Hadoop, and seeking to bring them the masses (of enterprises that is).

Related Topics: Channel: SEO | Google: Other | Google: Web History & Search History | Google: Web Search | Yahoo: General | Yahoo: Other


About The Author: is a Contributing Editor at Search Engine Land. He writes a personal blog Screenwerk, about SoLoMo issues and connecting the dots between online and offline. He also posts at Internet2Go, which is focused on the mobile Internet. Follow him @gsterling.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


SMX - Search Marketing Expo

SearchCap:

Get all the top search stories emailed daily!  

Like This Story? Please Share!

Other ways to share:

Like Our Site? Follow Us!

Subscribe to Our Feed! Join our LinkedIn Group Check out our Tumblr! See us on Pinterest Get Search Engine Land on your mobile device!
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

 
  • Advertise With Us
 

Click to watch SMX conference video

Join us at an upcoming SMX event:

North America

EMEA

APAC

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.

SMX Site » | SMX Difference » | SMX News »




 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide