Fun Stats: 28% Of Sites Use Google Analytics; 5% Have Facebook Or Twitter Links

Factual has analyzed data from 4 million web sites and provided a holiday gift for stats junkies. Did you know 5% of pages have either a Twitter or Facebook link? Or that 28% of sites run Google Analytics? Or 12% of them run Google AdSense? Now you do!

The core data comes from CommonCrawl, a non-profit group designed to crawl the web and provide data for anyone to use. Gil Elbaz is both a founder of CommonCrawl and of Factual, a start-up that creates tables of structured information from data found on the open web (see Factual: Parting The Curtains Of The Invisible Web).

Factual found stats such as I cited above after examining 4 million web sites. In particular:

  • 28% of sites have Google Analytics on them
  • 12% of sites have AdSense
  • 5% of sites have EITHER a Twitter or Facebook link but…
  • 2% of sites have BOTH a Twitter or Facebook link

There’s also a chart that shows other interesting stats but without precise percentages. I’ll estimate as best I can:

  • About 20% of sites have Flash
  • About 19% of sites have an RSS feed
  • About 6% of sites have a sitemaps file
  • About 1% of sites have a Google Webmaster Central verification code
  • About 1% of sites have Quantcast tracking code
  • About 0.5% of sites have a Creative Commons attribution

One thing unclear is how the stats break down on a page versus web site basis. A web site might have multiple pages. So when a “web site” is said to have AdSense on it, does that mean each page within the site has AdSense code? Or only some of them? It appears a decision was made on a site-by-side basis, with “site” being defined as all the pages within a set domain or subdomain.

Those interested can play with the data themselves. It’s summarized in this very large table at Factual.

CommonCrawl also gets a bit of publicity from this at an interesting time. Earlier this week, Google released a long internal memo talking about how important it was to the company to be open — except in the areas of search and ads:

In many cases, most notably our search and ads products, opening up the code would not contribute to these goals and would actually hurt users. The search and advertising markets are already highly competitive with very low switching costs, so users and advertisers already have plenty of choice and are not locked in.

I’ll likely do my own follow-up post to that memo in the near future. In the meantime, a post I wrote back in 2007 — Google: As Open As It Wants To Be (i.e., When It’s Convenient) — looks at how Google’s claims of being open tend to ring false when open isn’t something it seems to pursue in areas where it is ahead. In part from my post:

That large index gives Google a huge advantage over rivals. It knows more about what’s on the web than anyone else. So why not share? Why not start an Open Index Alliance where there’s a coordinated effort to crawl and index all the documents in the world, allowing anyone to tap into the raw data?

That’s the idea behind CommonCrawl. Maybe as part of being open, Google could get behind the project?

See also Chris Dixon’s post from this week, Google should open source what actually matters: their search ranking algorithm, for related thoughts about Google, search and openness, along with comments from me and others, including the head of Google’s spam fighting team Matt Cutts.

As for ads, see Schmidt: Someday, AdSense Publishers May Know Google’s Cut Of Ad Revenues, from me earlier this year, which looks at how most AdSense publishers have no idea how much money Google keeps back for itself. It’s hard to find an arugment that support not being open about this, in the face of Google’s declared love of open.

Related Topics: Channel: Strategy | Factual | Google: Business Issues | Google: Critics | Google: General | Search Engines | Search Engines: Experimental | Stats: General | Top News


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide