Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« SearchCap: The Day In Search, May 22, 2007 | Main | Driving Online Registrations: Think Beyond the White Paper »

May. 23, 2007 at 3:25am Eastern by Elliance

Search Illustrated: Blocking Search Engines With Robots.txt


Search Illustrated - A Column From Search Engine Land
While most of the time we want search engine crawlers to grab and index as much content from our web sites as possible, there are situations where we want to prevent crawlers from accessing certain pages or parts of a web site. For example, you don't want crawlers poking around on non-public parts of your web site. Nor do you want them trying to index scripts, utilities or other types of code. And finally, you may have duplicate content on your web site, and want to ensure that a crawler only gets one copy (the "canonical" version, in search engine parlance).

Today's Search Illustrated illustrates how you can use the "robots.txt" file as a "keep out" notice for search engine cawlers:

robots_txt_explained_500w.gif

Graphic by Elliance, an eMarketing firm specializing in results-driven search engine marketing, web site design, and outbound eMarketing campaigns. The firm is the creator of the ennect online marketing toolkit. The Search Illustrated column appears Tuesdays at Search Engine Land (and today only, on Wednesday... :-).

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Elliance Permalink Jump To Comments See Related Stories In: SEO: Blocking Spiders, Search Illustrated



Reader Comments

Should I understand from this post that the only kind of pages you recommend to block access to crawlers are private, scripts, and duplicate pages? Or these are just examples?

Do you also believe that internal search pages should be blocked? I understand that there is some controversy on the subject...

Thank you.
Daniel Waisberg

Yes, Daniel, I think that was the point of the illustration -- to be a general guide of examples as to what someone might block rather than explicit instructions saying you must do this.

As for search pages, you are correct. Google warns that these should be blocked.

Is is possible to block a portion of a page, or only the entire page?

Thanks in advance for any insight you can provide.

Yahoo supports blocking of parts of pages. See Yahoo Supports New Robots-Nocontent Tag To Block Indexing Within A Page for more about this.

Search:

Search Marketing Expo

Save the date for:
SMX Madrid (in Spanish, May 20-21)
SMX Advanced - Seattle, WA (June 3-4) Register today! Early bird rate expires May 9!
SMX Local & Mobile - San Francisco, CA (July 24-25) (July 24-25) Pre-agenda rate expires May 2. Get the lowest rate by registering now.
SMX East - NYC - (Oct. 6-8)
SMX London - November 4 & 5, 2008

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll