Apr 28, 2009 at 10:05am ET by Barry Schwartz
The Google Webmaster Central blog notified us that Googlers have presented a new study on Sitemaps at the WWW’09 conference in Madrid. The study is absolutely interesting and I recommend printing out the ten page PDF document and reading it. For those of you who don’t have time for that, I hope to highlight the most interesting findings from the study below.
The purpose of the study was to measure the past few years of Sitemaps usage at Google to determine how Sitemap files improve coverage and freshness of the Google web index. By coverage, I mean how Google crawls the web deeper and finds more content that it might not have found. Bt freshness, I mean how Google crawls new or updated content faster, when compared to the normal crawl.
Interesting facts from the study:
The paper discusses the process used by Google for Sitemaps. Here is a flow diagram that explains it quickly.
Coverage: The dataset used to measure the “coverage” of Sitemaps was approximately 3 million URLs, 1.7 millions URLs specifically from Sitemaps and the remainder from the normal discovery process. Duplicate URLs were close to one million during the discovery crawl process, as opposed to only a 100 duplicate URLs in the Sitemaps files. In short, the study found that discovery was 63% “efficient” and Sitemaps was 99% efficient in crawling the domain at the cost of mission a small fraction of content.
Freshness: How fresh can Google get with Sitemaps?
The paper then goes on to talk about coming up with ways to determine the crawl order, either via Sitemaps or Discovery. Concepts such as SitemapScore and DiscoveryScore are brought up and possible methods.
The study seems like a great read for most SEOs interested in understanding how Google Sitemaps work and how it can benefit your sites.
Share, Bookmark & Discuss This Article
More:
Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter
See more stories like this in the Members Library! Check out the Google: SEO, Google: Web Search, Google: Webmaster Central, SEO: Submitting & Sitemaps, Top News sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!
TOP STORIES
SEARCH NEWS BRIEFS
FEATURES & ANALYSIS
RECENT COMMENTS
Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
SMX Web Site » | SMX Difference » | SMX News »
Join us at an upcoming SMX event:
Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:
Featured sites from our Blogroll
Become a premium member today and receive:
Great summary, Barry!
I like the part about formats, “Unknown (17.5%)”. My guess is these aren’t weird formats, just attempts at regular Sitemap XML that have significant typos.