Yahoo!, Google, Microsoft Clarify Robots.txt Support

Today, Google, Yahoo!, and Microsoft have come together to post details of how each of them support robots.txt and the robots meta tag. While their posts use terms like “collaboration” and “working together,” they haven’t joined together to implement a new standard (as they did with sitemaps.org). Rather, they are simply making a joint stand in messaging that robots.txt is the standard way of blocking search engine robot access to web sites. They have identified a core set of robots.txt and robots meta tag directives that all three engines support:

Google and Yahoo! already supported and documented each of the core directives, and Microsoft supported most of them before this announcement. In their posts, they also list the directives they support that may not be supported by the other engines.

For robots.txt, they all support:

  • Disallow
  • Allow
  • Use of wildcards
  • Sitemap location

For robots meta tags, they all support:

  • noindex
  • nofollow
  • noarchive
  • nosnippet
  • noodpt

With this announcement, Microsoft appears to be adding support for the use of * wildcards (which will go live later this month) and the Allow directive. The biggest discrepancy is with the crawl-delay directive. Yahoo! and Microsoft support it, while Google does not (although Google does support control of crawl speed via Webmaster Tools ).

This isn’t the first time the major search engines have come together for an announcement regarding how they support publishers. In late 2006, all three joined together to support XML Sitemaps and launched sitemaps.org, followed in April 2007 with support for Sitemaps autodiscovery in robots.txt, and in February 2008 with more support for more flexible storage locations of Sitemap files. In early 2005, the engines declared support for the nofollow attribute on links (in an effort to combat comment spam).

Why are the search engines coming together to talk about their varied support for traditional methods for blocking access to web content? A Microsoft spokesperson told me that while robots.txt has been the de facto standard for some time, the search engines had never come together to detail how they support it and said the aim is to “make REP more intuitive and friendly to even more publishers on the web.” Google similarly said that “doing a joint post allows webmasters to see how we all honor REP directives, the majority of which are identical, but we also call out those that are not used by all of us.”

Yahoo! told me:

Our goal is to come out with clear information about the actual support around REP for all engines. We have all separately at different times reported our support and this creates a long trail hard for anyone to put together. Posting the same spec at the same time provides a sync point for everyone as to the actual similarities or differences between our implementations for all engines. We are trying to address the latent concerns around differences across the engines.

Of course, each engine has provided documentation in their respective help centers for some time, and Google and Microsoft provide robots.txt analysis tools that detail how they interpret a file in their webmaster tools, so while they haven’t documented their support jointly, the documentation itself isn’t new.

This move may be an effort to show a consolidated front in light of the ongoing publisher attempts to create new search engine access standards with ACAP. This direction reflects the ongoing direction of the messaging the search engines have had about ACAP. For instance, Rob Jonas, Google’s head of media and publishing partnerships in Europe, said in March that “the general view is that the robots.txt protocol provides everything that most publishers need to do.”

For more information, see each engine’s blog posts (updated as their posts go live):

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEO | SEO: Blocking Spiders

Sponsored


About The Author: is a Contributing Editor at Search Engine Land. She built Google Webmaster Central and went on to found software and consulting company Nine By Blue and create Blueprint Search Analytics< which she later sold. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a foundation for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide