• Search Engine Land
  • Sections
    • SEO
    • SEM
    • Local
    • Retail
    • Google
    • Bing
    • Social
    • Resources
    • More
    • Home
  • Follow Us
    • Follow
  • Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • Live
  • More
  • Events
    • Follow
  • SUBSCRIBE

Search Engine Land

Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • More
  • Newsletters
  • Home
SEO

Google Study On Sitemaps

The Google Webmaster Central blog notified us that Googlers have presented a new study on Sitemaps at the WWW’09 conference in Madrid. The study is absolutely interesting and I recommend printing out the ten page PDF document and reading it. For those of you who don’t have time for that, I hope to highlight the […]

Barry Schwartz on April 28, 2009 at 10:05 am
  • More

The Google Webmaster Central blog notified us that Googlers have presented a new study on Sitemaps at the WWW’09 conference in Madrid. The study is absolutely interesting and I recommend printing out the ten page PDF document and reading it. For those of you who don’t have time for that, I hope to highlight the most interesting findings from the study below.

The purpose of the study was to measure the past few years of Sitemaps usage at Google to determine how Sitemap files improve coverage and freshness of the Google web index. By coverage, I mean how Google crawls the web deeper and finds more content that it might not have found. Bt freshness, I mean how Google crawls new or updated content faster, when compared to the normal crawl.

Interesting facts from the study:

  • ~35 million Sitemaps were published, as of October 2008.
  • The 35 million Sitemaps include “several billion” URLs.
  • Most popular Sitemap formats include XML (77%), Unknown (17.5%), URL list (3.5%), Atom (1.6%) and RSS (0.11%).
  • 58% of URLs in Sitemaps contain the lastmodification date.
  • 7% of URLs contain the change frequency field.
  • 61% of URLs contain the priority field.

The paper discusses the process used by Google for Sitemaps. Here is a flow diagram that explains it quickly.

Google sitemaps crawl process

Coverage:
The dataset used to measure the “coverage” of Sitemaps was approximately 3 million URLs, 1.7 millions URLs specifically from Sitemaps and the remainder from the normal discovery process. Duplicate URLs were close to one million during the discovery crawl process, as opposed to only a 100 duplicate URLs in the Sitemaps files. In short, the study found that discovery was 63% “efficient” and Sitemaps was 99% efficient in crawling the domain at the cost of mission a small fraction of content.

  • The percent of duplicates inside Sitemaps is mostly similar to the overall percent of duplicates.
  • 46% of the domains have above 50% UniqueCoverage and above 12% have above 90% UniqueCoverage.
  • For most domains, Sitemaps achieves a higher percent of URLs in the index with less unique pages.

Freshness:
How fresh can Google get with Sitemaps?

  • 78% of URLs were seen by Sitemaps first, compared to 22% that were seen through discovery first.
  • 14.2% of URLs are submitted through ping
  • The probability of seeing a URL through Sitemaps before seeing it through discovery is independent of whether the Sitemaps was submitted using pings or using robots.txt

The paper then goes on to talk about coming up with ways to determine the crawl order, either via Sitemaps or Discovery. Concepts such as SitemapScore and DiscoveryScore are brought up and possible methods.

The study seems like a great read for most SEOs interested in understanding how Google Sitemaps work and how it can benefit your sites.



About The Author

Barry Schwartz
Barry Schwartz a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics. Barry's personal blog is named Cartoon Barry and he can be followed on Twitter here.

Related Topics

Channel: SEOGoogle: Search ConsoleGoogle: SEOGoogle: Web SearchSEO: Submitting & Sitemaps

We're listening.

Have something to say about this article? Share it with us on Facebook, Twitter or our LinkedIn Group.

Get the daily newsletter search marketers rely on.
See terms.

ATTEND OUR EVENTS

Lorem ipsum doler this is promo text about SMX events.

February 23, 2021: SMX Report

April 13, 2021: SMX Create

May 18-19, 2021: SMX London

June 8-9, 2021: SMX Paris

June 15-16, 2021: SMX Advanced

August 17, 2021: SMX Convert

November 9-10, 2021: SMX Next

October 2021: SMX Advanced Europe

December 17, 2021: SMX Code

Available On-Demand: SMX

×


Learn More About Our SMX Events

Discover actionable tactics that can help you overcome crucial marketing challenges. Our next conference will be held:

MarTech 2021: March 16-17

MarTech 2021: Sept. 14-15

MarTech 2020: Watch On-Demand

×

Attend MarTech - Click Here


Learn More About Our MarTech Events

White Papers

  • The State of Local Marketing Report 2020-2021
  • Quality CRM Data: The Key to Delivering Great Customer Experiences
  • How the Microsoft Search Network Can Maximize Your Search Campaigns
  • The Marketer’s Playbook for Customer Acquisition
  • How To Optimize SEO With UGC
See More Whitepapers

Webinars

  • How to Avoid the Digital Transformation Trap
  • How to Build a Marketing System of Record
  • Meet BIMI: The brand-boosting email security marketers must have for 2021
See More Webinars

Research Reports

  • Local Marketing Solutions for Multi-Location Businesses
  • Enterprise Digital Asset Management Platforms
  • Identity Resolution Platforms
  • Customer Data Platforms
  • B2B Marketing Automation Platforms
  • Call Analytics Platforms
See More Research

h
Receive daily search news and analysis.
Search Engine Land
Download the Search Engine Land App on iTunes Download the Search Engine Land App on Google Play

Channels

  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social

Our Events

  • SMX
  • MarTech

Resources

  • White Papers
  • Research
  • Webinars
  • Search Marketing Expo
  • MarTech Conference

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff
  • Connect With Us

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • Instagram
  • RSS
  • Youtube
  • iOS App
  • Google Play

© 2021 Third Door Media, Inc. All rights reserved.

Your privacy means the world to us. We share your personal information only when you give us explicit permission to do so, and confirm we have your permission each time. Learn more by viewing our privacy policy.Ok