Robots.txt Study Shows Webmasters Favor Google; BotSeer Robots.txt Search Engine Released

The Pennsylvania State University conducted a study that showed webmasters favored Google over other search engines in terms of allowing access to their web sites. An associated BotSeer search engine that allows searching across a collection of robots.txt files was also released.

The study looked at which robots or crawlers were listed in a web site’s robots.txt file, and Google was listed more often than any other search engine. The paper is named Determining Bias to Search Engines from Robots.txt (PDF) (it may be slow, so here is a local copy) and showed some interesting details.

The most commonly used user agent is the “universal robot,” where 93.8 percent of sites with robots.txt files have a rule allowing any crawler to access the site. 72.4 percent of the robots.txt files mentioned specific robots by name.

The chart below shows that Google’s robot, GoogleBot, is named more often than any other search engine:

Robots.txt Study

The chart below compares search engine market share to robot bias:

Robots.txt Study

The study also collects historical data on the increased usage of the robots.txt file by webmasters. It is definitely worth downloading and reading.

One more note: I mentioned this morning a quote from Eytan of Live Search:

One thing that we noticed for example while mining our logs is that there are still a fair number of sites that specifically only allow Googlebot and do not allow MSNBot.

This study confirms Eytan’s statement.

Postscript From Danny: I skimmed the report and hope to look more later. However, saying Google is most favored by seeing if Googlebot is named with allow statements isn’t conclusive. For example, Googlebot might include things like the Google AdSense crawler — and allowing that while banning other spiders still might be banning Google itself. That said, I have no doubt site owners think more about Google than other search engines when crafting their files.

Related Topics: Channel: SEO | Search Engines: Other Search Engines | SEO: Blocking Spiders | Stats: Popularity


About The Author: is Search Engine Land's News Editor and owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics. Barry's personal blog is named Cartoon Barry and he can be followed on Twitter here. For more background information on Barry, see his full bio over here.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide