Common Crawl Makes New Web Data Available, Launches Coding Contest

Looking to do research based on data gathered from across the web? That’s one of the purposes of Common Crawl, and the group has just released new data, as well as a contest to encourage use of that data

The 2012 data, which contains 3.8 billion web documents, shows stats such as 63% of top level domains being .com or there being 61 million domains overall.

Common Crawl is also currently running its first-ever Common Crawl Code Contest challenging developers to do something innovative using the data relating to job trends or social impact analysis. Three winners will each get $1,000 in cash, an O’Reilly Data Science Starter Kit, one year of GitHub’s Small Plan and more. Submissions are accepted through August 29.

FYI, I’m on the advisory board for the non-profit group. There’s no compensation for that involvement. I and others just offer free advice to the group.

You can learn more about Common Crawl on its FAQ page, the Get Started page and in the video below:

YouTube Preview Image

Common Crawl’s data from 2011 was recently used by Zyxt Labs to show how much Facebook has spread across the open web. See our Marketing Land article for more on that:

Related Topics: Channel: Consumer | Common Crawl | Top News


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


SMX - Search Marketing Expo

SearchCap:

Get all the top search stories emailed daily!  

Like This Story? Please Share!

Other ways to share:

Like Our Site? Follow Us!

Subscribe to Our Feed! Join our LinkedIn Group Check out our Tumblr! See us on Pinterest Get Search Engine Land on your mobile device!
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.garysieling.com/blog/ Gary

    It’d be nice if they had a full text search UI built on top of it, just to be able to test what’s included in it.

Get Our News, Everywhere!

 
  • Advertise With Us
 

Click to watch SMX conference video

Join us at an upcoming SMX event:

North America

EMEA

APAC

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.

SMX Site » | SMX Difference » | SMX News »




 

Search Engine Land Periodic Table of SEO Ranking Factors

Get Your Copy
Read The Full SEO Guide