« SearchCap: The Day In Search, May 22, 2007 | Main | Driving Online Registrations: Think Beyond the White Paper »
May. 23, 2007 at 3:25am Eastern by Elliance
Search Illustrated: Blocking Search Engines With Robots.txt
While most of the time we want search engine crawlers to grab and index as much content from our web sites as possible, there are situations where we want to prevent crawlers from accessing certain pages or parts of a web site. For example, you don't want crawlers poking around on non-public parts of your web site. Nor do you want them trying to index scripts, utilities or other types of code. And finally, you may have duplicate content on your web site, and want to ensure that a crawler only gets one copy (the "canonical" version, in search engine parlance).
Today's Search Illustrated illustrates how you can use the "robots.txt" file as a "keep out" notice for search engine cawlers:

Graphic by Elliance, an eMarketing firm specializing in results-driven search engine marketing, web site design, and outbound eMarketing campaigns. The firm is the creator of the ennect online marketing toolkit. The Search Illustrated column appears Tuesdays at Search Engine Land (and today only, on Wednesday... :-).
|
Like The Story? Vote For It On Yahoo Buzz!
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds) |
|
Subscribe To Our Search Feed! |
| Share & Bookmark This Story! |
By Elliance
Permalink
Jump To Comments
See Related Stories In: SEO: Blocking Spiders, Search Illustrated
Reader Comments
Yes, Daniel, I think that was the point of the illustration -- to be a general guide of examples as to what someone might block rather than explicit instructions saying you must do this.
As for search pages, you are correct. Google warns that these should be blocked.
Is is possible to block a portion of a page, or only the entire page?
Thanks in advance for any insight you can provide.
Yahoo supports blocking of parts of pages. See Yahoo Supports New Robots-Nocontent Tag To Block Indexing Within A Page for more about this.

![[TypeKey Profile Page]](http://searchengineland.com/nav-commenters.gif)


Should I understand from this post that the only kind of pages you recommend to block access to crawlers are private, scripts, and duplicate pages? Or these are just examples?
Do you also believe that internal search pages should be blocked? I understand that there is some controversy on the subject...
Thank you.
Daniel Waisberg