Yahoo!, Google, Microsoft Clarify Robots.txt Support
Today, Google, Yahoo!, and Microsoft have come together to post details of how each of them support robots.txt and the robots meta tag. While their posts use terms like “collaboration” and “working together,” they haven’t joined together to implement a new standard (as they did with sitemaps.org). Rather, they are simply making a joint stand in messaging that robots.txt is the standard way of blocking search engine robot access to web sites. They have identified a core set of robots.txt and robots meta tag directives that all three engines support:
Google and Yahoo! already supported and documented each of the core directives, and Microsoft supported most of them before this announcement. In their posts, they also list the directives they support that may not be supported by the other engines.
For robots.txt, they all support:
- Use of wildcards
- Sitemap location
For robots meta tags, they all support:
With this announcement, Microsoft appears to be adding support for the use of * wildcards (which will go live later this month) and the Allow directive. The biggest discrepancy is with the crawl-delay directive. Yahoo! and Microsoft support it, while Google does not (although Google does support control of crawl speed via Webmaster Tools ).
This isn’t the first time the major search engines have come together for an announcement regarding how they support publishers. In late 2006, all three joined together to support XML Sitemaps and launched sitemaps.org, followed in April 2007 with support for Sitemaps autodiscovery in robots.txt, and in February 2008 with more support for more flexible storage locations of Sitemap files. In early 2005, the engines declared support for the nofollow attribute on links (in an effort to combat comment spam).
Why are the search engines coming together to talk about their varied support for traditional methods for blocking access to web content? A Microsoft spokesperson told me that while robots.txt has been the de facto standard for some time, the search engines had never come together to detail how they support it and said the aim is to “make REP more intuitive and friendly to even more publishers on the web.” Google similarly said that “doing a joint post allows webmasters to see how we all honor REP directives, the majority of which are identical, but we also call out those that are not used by all of us.”
Yahoo! told me:
Our goal is to come out with clear information about the actual support around REP for all engines. We have all separately at different times reported our support and this creates a long trail hard for anyone to put together. Posting the same spec at the same time provides a sync point for everyone as to the actual similarities or differences between our implementations for all engines. We are trying to address the latent concerns around differences across the engines.
Of course, each engine has provided documentation in their respective help centers for some time, and Google and Microsoft provide robots.txt analysis tools that detail how they interpret a file in their webmaster tools, so while they haven’t documented their support jointly, the documentation itself isn’t new.
This move may be an effort to show a consolidated front in light of the ongoing publisher attempts to create new search engine access standards with ACAP. This direction reflects the ongoing direction of the messaging the search engines have had about ACAP. For instance, Rob Jonas, Google’s head of media and publishing partnerships in Europe, said in March that “the general view is that the robots.txt protocol provides everything that most publishers need to do.”
For more information, see each engine’s blog posts (updated as their posts go live):
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.