A Deeper Look At Robots.txt

The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed page [...]

Filed in: 100% Organic, How To: SEO, SEO: Blocking Spiders


Google’s Advice On Using The New Canonical Tag

A month ago, Google, Yahoo and Microsoft announced they will be supporting a new canonical tag that allows you to tell search engines that page X is a duplicate page to page Z. In a way, it is a 301 redirect, without the physical redirect. The tag is incredibly powerful, as are 301 redirects and using this tag should be done with caution and slowly. Matt Cutts posted a new video explaining how one should go about using this tag, being that it is so new. Here is the video: [...]

Filed in: Google: SEO, Google: Webmaster Central, SEO: Blocking Spiders, SEO: Duplicate Content, SEO: Redirects & Moving Sites, SEO: Submitting & Sitemaps, SEO: Tagging, SEO: Titles & Descriptions


Live Search Testing New Crawler; MSNBot/2.0b

The Live Search Blog announced they are letting a new robot loose. The new search engine crawler is named msnbot/2.0b and will be added to the army of current MSN spiders, currently named msnbot/1.1. The new spider is currently being tested but will ultimately replace the old spider. The new spider will respect the current robots.txt protocol set up for MSNBot, so no need to set up anything new in your robots.txt file. In addition, Microsoft promised to crawl slowly in their msnbot/2.0b tests. MSNBot/1.1 is not that old. It was added back in February of this year and introduced HTTP [...]

Filed in: Microsoft: Bing, SEO: Blocking Spiders


Irony: If Google Can’t Reach Your Robots.txt File, It Might Not List Your Site

I reported at the Search Engine Roundtable this morning that Google said if your robots.txt is unreachable, your site might not make it into the Google index. By unreachable, Google means that if your server simply times out and does not return any server response when Googlebot attempts to access your robots.txt file, then it might not include any of your pages in their index. Googler John Mueller explained that Google tends to lean on the "safe" side when this situation pops up. When I showed this to Danny, he felt it was ironic that if Google can't read what you want to block, it might bl [...]

Filed in: Google: SEO, SEO: Blocking Spiders


Everything You Wanted To Know About Blocking Search Engines

Last week, the three major search engines came together to say how they agree -- and disagree -- over the Robots Exclusion Protocol. It's such an important standard, one every webmaster should understand. To help, Vanessa Fox has compiled an extensive and outstanding overview of it at Jane & Robot in her Managing Robot's Access To Your Website post. The tutorial takes you through key areas such as: A nice chart showing what you can block using either robots.txt or the meta robots tag for each major search engine. It also covers other things like reverse DNS lookup to verify a crawler's [...]

Filed in: SEO: Blocking Spiders


Become a Premium Member and see all articles in our SEO: Blocking Spiders archives!

Premium Members get unrestricted access to all archives - and more! Premium membership details below...

Become A Search Engine Land Premium Member

Gain insider access to the industry's leading online publication, covering the latest search engine news, in-depth features and analysis of products, technologies and search trends.

Subscribe as a premium member today and you'll get:

  • Unlimited access to news & How To articles.
  • Exclusive videos & webinars with search experts.
  • Expanded community profile & posting privileges.
  • Discounts to SMX Conferences & Events.
  • 30 Day money back guarantee.

Learn More

FREE DAILY SEARCH NEWS RECAP!

Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›

STAY CURRENT THROUGHOUT THE DAY

RSS Feeds

The Search Engine Land feed keeps you informed as news happens. SEE ALL FEEDS »

Upcoming Search Engine Land Conferences

Advertise With Us »

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.


SMX Web Site » | SMX Difference » | SMX News »


Join us at an upcoming SMX event:

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:


See more webcast topics »

TRACK US SOCIALLY
Add to GoogleAdd to My Yahoo!Add to BloglinesAdd to NetvibesAdd to Windows Live