Everything You Wanted To Know About Blocking Search Engines

Last week, the three major search engines came together to say how they agree — and disagree — over the Robots Exclusion Protocol. It’s such an important standard, one every webmaster should understand. To help, Vanessa Fox has compiled an extensive and outstanding overview of it at Jane & Robot in her Managing Robot’s Access To Your Website post.

The tutorial takes you through key areas such as:

  • A nice chart showing what you can block using either robots.txt or the meta robots tag for each major search engine. It also covers other things like reverse DNS lookup to verify a crawler’s identity.
     
  • Types of content you want private from search engines versus public. Rather than private versus public, "not listed" versus "listed" might be better terms Anything that really should be private ought to be kept behind a password barrier. The tutorial does cover this, but it’s worth stressing that no one should think robots exclusion is a method to keep private/personally identifiable information out of search engines. But there’s other info that you might want "private" in terms of not being listed, such as printer-friendly pages, as the tutorial also explains.
     
  • How to block search engines, such as on a site-wide basis using robots.txt, along with tips like using wildcards, specifying particular search engines by crawler name. Page level blocking (with meta tags) is also covered. There are lots of examples.
     
  • Common mistakes and myths are addressed, such as the idea that using nofollow alone will keep pages from being indexed. Methods of testing implementation are also covered.

Bookmark the guide — it’s one you’ll want to come back to time and again.

Related Topics: Channel: SEO | SEO: Blocking Spiders

Sponsored


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide