<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Engine Land &#187; SEO: Blocking Spiders</title>
	<atom:link href="http://searchengineland.com/library/seo/seo-blocking-spiders/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: News On Search Engines, Search Engine Optimization (SEO) &#38; Search Engine Marketing (SEM)</description>
	<lastBuildDate>Fri, 10 Feb 2012 01:45:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Google Slows Web Crawlers To Help Blackouts Sites</title>
		<link>http://searchengineland.com/google-slows-web-crawlers-to-help-blackouts-sites-108477</link>
		<comments>http://searchengineland.com/google-slows-web-crawlers-to-help-blackouts-sites-108477#comments</comments>
		<pubDate>Wed, 18 Jan 2012 14:10:04 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=108477</guid>
		<description><![CDATA[As you know, there are many sites going black to protest SOPA and PIPA. Google has already offered blackout SEO advice but they decided to take it one step further by slowing down their spiders today. Pierre Far from Google posted on his Google+ that Google is slowing down GoogleBot&#8217;s crawl activity to reduce the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-95628" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/Google-Webmaster-SEO-Rep-1304428070.gif" alt="Google-Webmaster-SEO-Rep-1304428070" width="167" height="141" />As you know, there are many sites <a href="http://marketingland.com/why-the-web-is-going-dark-over-sopa-pipa-3608">going black to protest SOPA and PIPA</a>. Google has already offered <a href="http://searchengineland.com/blackout-your-site-without-hurting-seo-108302">blackout SEO advice</a> but they decided to take it one step further by slowing down their spiders today.</p>
<p>Pierre Far from Google posted on his <a href="https://plus.google.com/u/0/115984868678744352358/posts/iUN5MGJxEh9">Google+</a> that Google is slowing down GoogleBot&#8217;s crawl activity to reduce the effect on their site&#8217;s search rankings, if they did not follow the Google SEO advice from yesterday.</p>
<p>Pierre Far said:</p>
<blockquote>Hello webmasters! We realize many webmasters are concerned about the medium-term effects of today&#8217;s blackout. As a precaution, the crawl team at Google has configured Googlebot to crawl at a much lower rate for today only so that the Google results of websites participating in the blackout are less likely to be affected.</blockquote>
<h2>Related Articles</h2>
<ul>
<li><a href="http://marketingland.com/what-all-marketers-need-to-know-about-sopa-1677">What All Marketers Need To Know About SOPA – The Stop Online Piracy Act</a></li>
<li><a title="#BlackoutSOPA: A Look At The Social Media Movement That Helped Stall The SOPA Legislation" href="http://marketingland.com/blackoutsopa-a-look-at-the-social-media-movement-that-helped-stall-the-sopa-legislation-3453" rel="bookmark">#BlackoutSOPA: A Look At The Social Media Movement That Helped Stall The SOPA Legislation</a></li>
<li><a href="http://searchengineland.com/blackout-your-site-without-hurting-seo-108302">How To Blackout Your Site (For SOPA/PIPA) Without Hurting SEO</a></li>
<li><a href="http://searchengineland.com/google-blackens-logo-to-protest-sopa-pipa-108436">Google Blackens Its Logo To Protest SOPA/PIPA, While Bing &amp; Yahoo Carry On As Usual</a></li>
<li><a href="http://marketingland.com/why-the-web-is-going-dark-over-sopa-pipa-3608">Why The Web Is Going Dark Over SOPA &amp; PIPA</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-slows-web-crawlers-to-help-blackouts-sites-108477/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How To Blackout Your Site (For SOPA/PIPA) Without Hurting SEO</title>
		<link>http://searchengineland.com/blackout-your-site-without-hurting-seo-108302</link>
		<comments>http://searchengineland.com/blackout-your-site-without-hurting-seo-108302#comments</comments>
		<pubDate>Mon, 16 Jan 2012 19:39:18 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=108302</guid>
		<description><![CDATA[A number of websites are (or were) planning to &#8220;go black&#8221; this week while the U.S. Congress discusses issues related to the Stop Online Piracy Act (SOPA) and the Protect IP Act (PIPA). The website blackouts are part of a larger social media effort against the bills that our Greg Finn wrote about this morning [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-95628" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/Google-Webmaster-SEO-Rep-1304428070.gif" alt="Google-Webmaster-SEO-Rep-1304428070" width="167" height="141" />A number of websites are (or were) planning to &#8220;go black&#8221; this week while the U.S. Congress discusses issues related to the Stop Online Piracy Act (SOPA) and the Protect IP Act (PIPA). The website blackouts are part of a larger social media effort against the bills that our Greg Finn <a href="http://marketingland.com/blackoutsopa-a-look-at-the-social-media-movement-that-helped-stall-the-sopa-legislation-3453">wrote about this morning</a> on Marketing Land.</p>
<p>You may be thinking about joining the website blackout movement, but yikes &#8230; what about the SEO implications? How do you take your site offline in protest without messing up your visibility in Google&#8217;s search results?</p>
<p>Well, Google&#8217;s Pierre Far <a href="https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB">shared several tips</a> earlier today on Google+ in a post called &#8220;Website outages and blackouts the right way.&#8221;</p>
<p>In short, the advice is to use a 503 HTTP status code to tell spiders that the website is temporary unavilable. With a 503 status, Google won&#8217;t index the content (or lack thereof if you&#8217;re blacking out your site) and it won&#8217;t consider the site as having duplicate content issues (when all of the pages are blacked out).</p>
<p>But Far adds a couple important caveats to this advice regarding the robots.txt file and what will happen in Webmaster Tools if Google finds your site blacked out. Another Googler, John Mueller, adds additional information in the comments, so you&#8217;ll want to <a href="https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB">read the original Google+ post</a> if you&#8217;re thinking about blacking out your website this week for SOPA, or in the future for any other reason.</p>
<p>Of course, also keep in mind that Bing may not handle things the same way if you do blackout your site.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/blackout-your-site-without-hurting-seo-108302/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Google Can Now Execute AJAX &amp; JavaScript For Indexing</title>
		<link>http://searchengineland.com/google-can-now-execute-ajax-javascript-for-indexing-99518</link>
		<comments>http://searchengineland.com/google-can-now-execute-ajax-javascript-for-indexing-99518#comments</comments>
		<pubDate>Tue, 01 Nov 2011 17:57:26 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[SEO: Flash]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=99518</guid>
		<description><![CDATA[This morning we reported that the comments on Facebook are being indexed by Google. Google&#8217;s Matt Cutts just confirmed on Twitter that Google is now able to &#8220;execute AJAX/JS to index some dynamic comments.&#8221; This gives Google&#8217;s spider, GoogleBot, the ability to read comments in AJAX or JavaScript, such as Facebook comments or Disqus comments [...]]]></description>
			<content:encoded><![CDATA[<p>This morning we reported that the <a href="http://searchengineland.com/many-facebook-comments-now-being-indexed-by-google-99399">comments on Facebook are being indexed</a> by Google. Google&#8217;s Matt Cutts just <a href="https://twitter.com/#!/mattcutts/status/131425949597179904">confirmed on Twitter</a> that Google is now able to &#8220;execute AJAX/JS to index some dynamic comments.&#8221;</p>
<p>This gives Google&#8217;s spider, GoogleBot, the ability to read comments in AJAX or JavaScript, such as Facebook comments or Disqus comments and others that are dynamically loaded via AJAX or JavaScript. In addition, this means, Google is better at seeing the content behind more of your JavaScript or AJAX.</p>
<p><img class="alignnone size-full wp-image-99520" title="cutts-google-index-ajax" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/cutts-google-index-ajax.png" alt="" width="545" height="228" /></p>
<p><strong>Postscript:</strong> Google now has an official blog <a href="http://googlewebmastercentral.blogspot.com/2011/11/get-post-and-safely-surfacing-more-of.html">post</a> up with more details.</p>
<h3>Related Stories:</h3>
<ul>
<li><a href="http://searchengineland.com/google-proposes-to-make-ajax-crawlable-27408">Google Offers A Proposal To Make AJAX Crawlable</a></li>
<li><a href="http://searchengineland.com/google-offers-seo-advice-on-ajax-coding-12637">Google Offers SEO Advice On AJAX Coding</a></li>
<li><a href="http://searchengineland.com/googles-proposal-for-crawling-ajax-may-be-live-34411">Google May Be Crawling AJAX Now – How To Best Take Advantage Of It</a></li>
<li><a href="http://searchengineland.com/its-official-googles-proposal-for-crawling-ajax-urls-is-live-37298">It’s Official: Google’s Proposal For Crawling AJAX URLs is Live</a></li>
<li><a href="http://searchengineland.com/an-update-on-javascript-menus-and-seo-16060">An Update On Javascript Menus And SEO</a></li>
<li><a href="http://searchengineland.com/google-now-crawling-and-indexing-flash-content-14299">Google Now Crawling And Indexing Flash Content</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-can-now-execute-ajax-javascript-for-indexing-99518/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Google Disables URL Removals After Bug Allows Anyone To Remove Any Site</title>
		<link>http://searchengineland.com/google-disables-url-removals-after-bug-allows-anyone-to-remove-any-site-86352</link>
		<comments>http://searchengineland.com/google-disables-url-removals-after-bug-allows-anyone-to-remove-any-site-86352#comments</comments>
		<pubDate>Tue, 19 Jul 2011 19:10:28 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Webmaster Central]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[SEO: Redirects & Moving Sites]]></category>
		<category><![CDATA[SEO: Spamming]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=86352</guid>
		<description><![CDATA[This morning, James Breckenridge discovered a loophole within Google&#8217;s Webmaster Tools that allowed anyone to remove any site from Google. Both James and I sent this information to Google as soon as we heard of it. After several hours, Google has told us, &#8220;we&#8217;re still investigating this report, and to be cautious we disabled all [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://searchengineland.com/figz/wp-content/seloads/2011/07/Google-Webmaster.gif" alt="" title="Google Webmaster" width="167" height="141" class="alignright size-full wp-image-86354" />This morning, James Breckenridge <a href="http://www.jamesbreckenridge.co.uk/remove-any-site-from-google-even-if-you-dont-control-it.html">discovered</a> a loophole within Google&#8217;s Webmaster Tools that allowed anyone to remove any site from Google.</p>
<p>Both James and I sent this information to Google as soon as we heard of it.  After several hours, Google has told us, &#8220;we&#8217;re still investigating this report, and to be cautious we disabled all URL removals earlier this morning.&#8221;  So now, if you even own a site, you won&#8217;t be able to remove the site or pages from the site using Google&#8217;s <A href="http://www.google.com/support/webmasters/bin/answer.py?answer=164734">URL removal tool</a>.</p>
<p>How did this loophole work?  Pretty simple as James described. You use the following URL when logged into Google Webmaster Tools:</p>
<blockquote>https://www.google.com/webmasters/tools/removals-request?hl=en&#038;siteUrl=http://{YOUR_URL}/&#038;urlt={URL_TO_BLOCK}</blockquote>
<p>Then replace {YOUR_URL} with a URL you control within Webmaster Tools, and replace {URL_TO_BLOCK} with the URL of the site you want to block.  </p>
<p>You could block a whole site, section or single page this way, based on how you entered the URL.  To block a site, use the top level domain (E.g. http://www.someurl.com/), to block a section (subfolder) use a subfolder URL (E.g. http://www.someurl.com/somefolder/) and to block a page use the specific page URL  (E.g. http://www.someurl.com/somefolder/somepage.html).</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2011/07/Screen-shot-2011-07-19-at-3.02.29-PM-600x205.png" alt="" title="Screen shot 2011-07-19 at 3.02.29 PM" width="600" height="205" class="alignnone size-large wp-image-86355" /></p>
<p>I am waiting an update from Google on why this happened, if site&#8217;s were impacted and how long this was an issue.</p>
<p><strong>Postscript:</strong>: Google sent us a statement that they have fixed the issue.  A Google spokesperson said:</p>
<blockquote>We&#8217;ve confirmed that there was an issue within the URL removal feature in our Webmaster Tools and have already pushed out a fix and re-enabled URL removals. </p>
<p>The URL removal feature keeps detailed records, so we&#8217;re currently reprocessing earlier removal requests to ensure their validity. Our initial examination has shown only a limited impact.</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-disables-url-removals-after-bug-allows-anyone-to-remove-any-site-86352/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Google Webmaster Tools Remove URL With Blocking Not Required</title>
		<link>http://searchengineland.com/google-webmaster-tools-remove-url-with-blocking-not-required-77515</link>
		<comments>http://searchengineland.com/google-webmaster-tools-remove-url-with-blocking-not-required-77515#comments</comments>
		<pubDate>Tue, 17 May 2011 16:37:47 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Web Search]]></category>
		<category><![CDATA[Google: Webmaster Central]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=77515</guid>
		<description><![CDATA[Google announced on the Webmaster blog that they have removed a requirement for removing URLs via Google Webmaster Tools. Google no longer requires you to block access to the URL you want to remove from Google prior to submitted the URL removal request. Google said they have eliminated &#8220;the requirement that the webpage&#8217;s URL must [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://searchengineland.com/figz/wp-content/seloads/2011/05/url-removal-step-1-300x187.png" alt="" title="url-removal-step-1" width="300" height="187" class="alignright size-medium wp-image-77517" />Google <A href="http://googlewebmastercentral.blogspot.com/2011/05/easier-url-removals-for-site-owners.html">announced</a> on the Webmaster blog that they have removed a requirement for <a href="http://searchengineland.com/removing-pages-from-google-53086">removing URLs</a> via <a href="https://www.google.com/webmasters/tools/">Google Webmaster Tools</a>.</p>
<p>Google no longer requires you to block access to the URL you want to remove from Google prior to submitted the URL removal request.  Google said they have eliminated &#8220;the requirement that the webpage&#8217;s URL must first be blocked by a site owner before the page can be removed from Google&#8217;s search results.&#8221;  Why did Google drop this requirement?  Simply because since you have already validated and verified you are the owner of the site, Google felt it was redundant to require you to block the URL to prove again you are the site owner.  Google explained:</p>
<blockquote>You&#8217;ve already verified ownership of the site, we can eliminate this requirement to make it easier for you, as the site owner, to remove unwanted pages (e.g. pages accidentally made public) from Google&#8217;s search results.</blockquote>
<p>Please note, these URL removals are only temporary and last 90-days.  If you want the URLs and pages to never show in Google, you need to permanently remove them by 404ing the page or  blocking them via robots.txt file or a noindex meta tag.</p>
<p><strong>Related Stories:</strong></p>
<ul>
<li><a href="http://searchengineland.com/removing-pages-from-google-53086">Removing Pages From Google: A Comprehensive Guide For Content Owners</a></li>
<li><a href="http://searchengineland.com/google-releases-improved-content-removal-tools-10989">Google Releases Improved Content Removal Tools</a></li>
<li><a href="http://searchengineland.com/google-lets-you-tell-them-which-url-parameters-to-ignore-25925">Google Lets You Tell Them Which URL Parameters To Ignore</a></li>
<li><a href="http://searchengineland.com/up-close-personal-with-robotstxt-10978">Up Close &amp; Personal With Robots.txt</a></li>
<li><a href="http://searchengineland.com/removing-your-personal-information-from-google-55014">Removing Your Personal Information From Google</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-webmaster-tools-remove-url-with-blocking-not-required-77515/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Lesson From the Indexing of Google Translate: Blocking Search Results From Search Results</title>
		<link>http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529</link>
		<comments>http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529#comments</comments>
		<pubDate>Wed, 26 Jan 2011 22:37:24 +0000</pubDate>
		<dc:creator>Vanessa Fox</dc:creator>
				<category><![CDATA[Features: Analysis]]></category>
		<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[How To: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=62529</guid>
		<description><![CDATA[Last year, Google published an SEO Report Card of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google&#8217;s Matt Cutts presented the results at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers [...]]]></description>
			<content:encoded><![CDATA[<p>Last year, <a href="http://googlewebmastercentral.blogspot.com/2010/03/googles-seo-report-card.html">Google published an SEO Report Card</a> of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google&#8217;s Matt Cutts <a href="http://www.youtube.com/watch?v=x1iuqHdNfGo">presented the results</a> at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers who don&#8217;t work in search don&#8217;t get preferential treatment from those who do and just like any site on the internet, sometimes things aren&#8217;t implemented correctly. Just because a site is owned by Google doesn&#8217;t mean it&#8217;s the best example of what to do in terms of SEO.</p>
<p>This morning Rishi Lakhani <a href="http://twitter.com/rishil/status/30259383208251392">tweeted about Google Translate pages</a> appearing in Google search results. As you can see in the example below, pages with individual translation requests have been indexed.</p>
<p style="text-align: center;"><a rel="attachment wp-att-62544" href="http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529/translate1"><img class="aligncenter size-large wp-image-62544" style="border: 1px solid black;" title="Google Translate Search Results" src="http://searchengineland.com/figz/wp-content/seloads/2011/01/translate1-500x265.png" alt="Google Translate Search Results" width="500" height="265" /></a></p>
<p>All of the URLs that include a parameter seem to be individual translations. For instance, http://translate.google.com/?q=ART# displays as follows:</p>
<p style="text-align: center;"><a rel="attachment wp-att-62546" href="http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529/translate2"><img class="aligncenter size-large wp-image-62546" style="border: 1px solid black;" title="Google Translate Example" src="http://searchengineland.com/figz/wp-content/seloads/2011/01/translate2-500x200.png" alt="Google Translate Example" width="500" height="200" /></a></p>
<p>The problems with these types of pages being indexed in search engines is twofold:</p>
<ul>
<li>The <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35769">Google webmaster guideline</a>s say that Google <a href="http://www.mattcutts.com/blog/search-results-in-search-results/">doesn&#8217;t want to show search results in its search results</a> and recommends that content owners block search results on their site from being indexed using robots.txt or a meta robots tag.</li>
<li>That same guideline recommends blocking autogenerated pages from being indexed and a Google Webmaster Central blog a few months ago <a href="http://googlewebmastercentral.blogspot.com/2010/09/unifying-content-under-multilingual.html">provided recommendations for handling machine-translated text</a> so that it didn&#8217;t appear in search results.</li>
</ul>
<p>A site owner might also want to block these types of pages from being crawled and indexed to increase crawl efficiency and ensure the most valuable pages on the site are being crawled and indexed instead.</p>
<p>I asked Google about this and they confirmed that indeed it was simply a matter of the Google Translate team not being aware of the issue and said they would resolve it.</p>
<h2>Blocking Autogenerated Search Pages From Being Indexed</h2>
<p>In the case of Google Translate, the ideal scenario is that the <a href="http://translate.google.com/#">main page</a> and any secondary pages (such as <a href="http://translate.google.com/translate_tools">this tools page</a>) be indexed, but that any pages from translation requests not be indexed.</p>
<h3>Using robots.txt</h3>
<p>The best way to do this would be to add a <a href="http://code.google.com/web/controlcrawlindex/docs/robots_txt.html">disallow line in the robots.txt file</a> for the site that blocks indexing based on a pattern match of the URL query parameter. For instance:</p>
<pre>Disallow: /*q=</pre>
<p>This pattern would prevent search engines from indexing any URLs containing q=. (The * before the q= means that the q= can appear anywhere in the URL.)</p>
<p>In the case of translate.google.com (and all related TLDs), the robots.txt file that exists for the subdomains seems to be copied from www.google.com. Remember that search engines obey the robots.txt file for each subomain separately. Using the same robots.txt file for a subdomain that&#8217;s used for the www variation of the domain could have unintended consequences because the subomain likely has an entirely different folder and URL structure. (You can always check the behavior of your robots.txt file using Google Webmaster Tools.)</p>
<p>Adding the disallow pattern shown above to the www.google.com/robots.txt file would not work as search engines wouldn&#8217;t check that file when crawling the translate subdomain and in would instead cause search engines not to index URLs that match the pattern on www.google.com.</p>
<p>translate.google.com (and all google.com subdomains should have their own robots.txt file that&#8217;s customized for that subdomain.</p>
<h3>Using the meta robots tag</h3>
<p>If Google isn&#8217;t able to create a separate robots.txt file for the translate subdomain, they should first remove the file that&#8217;s there (and from other subdomains as well, as it could be causing unexpected indexing results for those subdomains). Then, they should use the meta robots tag on the individual pages they want blocked. Since the pages in question are dynamically generated, the way to do this would be to add logic to the code that generates these pages that writes the robots meta tag to the page as its created. This tag belongs in the &lt;head&gt; section of the page and looks as follows:</p>
<pre>&lt;meta="robots" content="noindex"&gt;</pre>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Releases Details On Controlling GoogleBot &amp; Google&#8217;s Crawl</title>
		<link>http://searchengineland.com/google-releases-details-on-controlling-googlebot-googles-crawl-56946</link>
		<comments>http://searchengineland.com/google-releases-details-on-controlling-googlebot-googles-crawl-56946#comments</comments>
		<pubDate>Fri, 26 Nov 2010 13:47:09 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=56946</guid>
		<description><![CDATA[The Google Webmaster Central blog announced Google has published a new set of documents in the Google Code section on how to control Google from crawling and indexing your site. You can read the set of documents over here. The technical documents are broken down into five section: Getting Started Robots.txt specification Robots meta tag [...]]]></description>
			<content:encoded><![CDATA[<p>The Google Webmaster Central blog <a href="http://googlewebmastercentral.blogspot.com/2010/11/controlling-crawling-and-indexing-now.html">announced</a> Google has published a new set of documents in the Google Code section on how to control Google from crawling and indexing your site.  You can read the set of documents <a href="http://code.google.com/web/controlcrawlindex/">over here</a>.</p>
<p>The technical documents are broken down into five section:</p>
<ul>
<li><A href="http://code.google.com/web/controlcrawlindex/docs/getting_started.html">Getting Started</a></li>
<li><a href="http://code.google.com/web/controlcrawlindex/docs/robots_txt.html">Robots.txt specification</A></li>
<li><A href="http://code.google.com/web/controlcrawlindex/docs/robots_meta_tag.html">Robots meta tag and X-Robots-Tag specification</a></li>
<li><a href="http://code.google.com/web/controlcrawlindex/docs/crawlers.html">Google&#8217;s crawlers</a></li>
<li><A href="http://code.google.com/web/controlcrawlindex/docs/references.html">References</a></li>
</ul>
<p>I personally printed them out as my weekend reading.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-releases-details-on-controlling-googlebot-googles-crawl-56946/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Robots.txt Recruiter: Daily Mail Uses Robots.txt File To Find SEO</title>
		<link>http://searchengineland.com/robots-txt-recruiter-daily-mail-uses-robots-txt-file-to-find-seo-49191</link>
		<comments>http://searchengineland.com/robots-txt-recruiter-daily-mail-uses-robots-txt-file-to-find-seo-49191#comments</comments>
		<pubDate>Tue, 24 Aug 2010 13:20:11 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[SEM Industry: General]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=49191</guid>
		<description><![CDATA[Malcolm Coles spotted that the Daily Mail, one of the UK&#8217;s largest papers, changed their robots.txt file to include a line which reads: # August 12th, MailOnline are looking for a talented SEO Manager so if you found this then you&#8217;re the kind of techie we need!# Send your CV to holly dot ward at [...]]]></description>
			<content:encoded><![CDATA[<p>Malcolm Coles <a href="http://www.malcolmcoles.co.uk/blog/seo-job-mail-robots/">spotted</a> that the <a href="http://www.dailymail.co.uk/">Daily Mail</a>, one of the UK&#8217;s largest papers, changed their <a href="http://www.dailymail.co.uk/robots.txt">robots.txt file</a> to include a line which reads:</p>
<blockquote># August 12th, MailOnline are looking for a talented SEO Manager so if you found this then you&#8217;re the kind of techie we need!<br /># Send your CV to holly dot ward at mailonline dot co dot uk</blockquote>
<p>How clever! They suspect some of the best SEOs out there would be sniffing around their robots.txt file and used it to recruit a new SEO manger.  If anything, it is getting the word out there via the press that they are looking for a new SEO.</p>
<p>This reminds me of when Google added the <A href="http://searchengineland.com/spooky-search-engines-on-halloween-15321">User-agent: zombies</a> Disallow: /brains to their robots.txt file on Halloween.  Also, in 2006, Brett Tabke used his <A href="http://www.seroundtable.com/archives/006720.html">robots.txt file as his blog</a> for a period of time.</p>
<p>I do not believe anyone every used it to hire an SEO before.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/robots-txt-recruiter-daily-mail-uses-robots-txt-file-to-find-seo-49191/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Facebook: No Plans To Give Search Engines Access To Facebook Questions</title>
		<link>http://searchengineland.com/facebook-questions-no-search-engine-indexing-47671</link>
		<comments>http://searchengineland.com/facebook-questions-no-search-engine-indexing-47671#comments</comments>
		<pubDate>Thu, 29 Jul 2010 21:21:55 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Search Engines: Answer Search Engines]]></category>
		<category><![CDATA[Search Engines: Help Engines]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[Top News]]></category>
		<category><![CDATA[Yahoo: Answers]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=47671</guid>
		<description><![CDATA[That&#8217;s one of the big questions people are asking after yesterday&#8217;s launch of Facebook Questions. While many have assumed the answer would be &#8220;yes,&#8221; a Facebook spokesperson tells us that assumption is wrong. Currently, search engines cannot access questions and answers through our Questions product. That may be something we consider for the future but [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://searchengineland.com/figz/wp-content/seloads/2010/07/facebook-questions.png" alt="facebook-questions" width="545" height="92" class="alignnone size-full wp-image-47672" /></p>
<p>That&#8217;s one of the big questions people are asking after <a href="http://searchengineland.com/facebook-questions-opens-to-limited-public-release-47523">yesterday&#8217;s launch</a> of <a href="http://searchengineland.com/up-close-with-facebook-questions-47567">Facebook Questions</a>. While many have assumed the answer would be &#8220;yes,&#8221; a Facebook spokesperson tells us that assumption is wrong.</p>
<blockquote>Currently, search engines cannot access questions and answers through our Questions product. That may be something we consider for the future but have no current plans to allow it.</blockquote>
<p>Facebook is blocking search engines by only showing Questions to logged-in users. Sure enough, a <a href="http://www.google.com/search?q=site%3Afacebook.com%2Fquestions%2F">site:facebook.com/questions/</a> search on Google shows only a handful of results, none of which are actually Q&#038;A from the first 24 hours that the beta has been open.</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2010/07/goog-fb.png" alt="goog-fb" width="550" height="494" class="alignnone size-full wp-image-47673" /></p>
<p>The same search produces zero results on both <a href="http://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Ffacebook.com%2Fquestions%2F">Yahoo</a> and <a href="http://www.bing.com/search?q=site%3Afacebook.com%2Fquestions%2F">Bing</a>.</p>
<p>Facebook&#8217;s move is unusual. As far back as 2007, the company was starting to <a href="http://searchengineland.com/facebook-opens-profiles-to-tap-into-google-traffic-while-google-grabs-facebooks-news-feed-idea-12096">open up some content to search engines</a> &#8212; a trend that&#8217;s continued more recently with <a href="http://searchengineland.com/liveblogging-googles-web-search-evolution-event-31317">various agreements</a> to let search engines access certain user content. For years, Yahoo Answers has been a pretty formidable <a href="http://www.smallbusinesssem.com/part-two-why-use-yahoo-answers/1063/">rankings powerhouse</a>, and no doubt gets a substantial amount of traffic via search engines. </p>
<p>It&#8217;s odd that Facebook would want to ignore that traffic source entirely. Don&#8217;t be surprised if this policy changes.</p>
<p><strong>Postscript, July 30:</strong> Experian Hitwise has responded to my claim near the end of this article about how much traffic Yahoo Answers gets via search engines. In a <a href="http://twitter.com/Hitwise_US/status/19919086878">tweet this morning</a>, Hitwise reports that &#8220;62% of upstream visits to Yahoo Answers came via Google last week.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/facebook-questions-no-search-engine-indexing-47671/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>OpenOffice.org Is MIA In Bing, But It&#8217;s Not Censorship</title>
		<link>http://searchengineland.com/openofficeorg-mia-in-bing-but-not-censorship-39004</link>
		<comments>http://searchengineland.com/openofficeorg-mia-in-bing-but-not-censorship-39004#comments</comments>
		<pubDate>Tue, 30 Mar 2010 00:08:21 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Microsoft: Bing]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=39004</guid>
		<description><![CDATA[The home page of OpenOffice.org, the well-known Microsoft Office competitor, is missing from Microsoft&#8217;s Bing search engine. While it sounds suspicious, the problem has nothing to do with Bing itself &#8212; it&#8217;s a technical problem on OpenOffice.org&#8217;s end. Ian McAnerin noticed earlier today that OpenOffice.org doesn&#8217;t show up in Bing on searches for [open office] [...]]]></description>
			<content:encoded><![CDATA[<p>The home page of <a href="http://www.openoffice.org/">OpenOffice.org</a>, the well-known Microsoft Office competitor, is missing from Microsoft&#8217;s Bing search engine. While it sounds suspicious, the problem has nothing to do with Bing itself &#8212; it&#8217;s a technical problem on OpenOffice.org&#8217;s end.</p>
<p>Ian McAnerin <a href="http://mcanerin.blogspot.com/2010/03/open-office-vs-bing.html">noticed earlier today</a> that OpenOffice.org doesn&#8217;t show up in Bing on searches for [<a href="http://www.bing.com/search?q=open+office&#038;go=&#038;form=QBLH&#038;filt=all&#038;qs=n&#038;sc=8-11">open office</a>] and [<a href="http://www.bing.com/search?q=openoffice.org&#038;go=&#038;form=QBRE&#038;filt=all">openoffice.org</a>]. He wonders if Bing is &#8220;allowing its results to be unduly influenced by either money or corporate policy.&#8221; But, upon further digging with some help from SEL&#8217;s Vanessa Fox, that&#8217;s not the case. </p>
<p>To be clear: Pages from the openoffice.org domain <em>do</em> show up in Bing &#8212; a [<a href="http://www.bing.com/search?q=site%3Aopenoffice.org">site:openoffice.org</a>] search proves that. But the home page itself is nowhere to be found.</p>
<p>It seems as though the problem is simply due to a technical misconfiguration on the openoffice.org servers. This issue is impacting Yahoo&#8217;s index as well as Bing&#8217;s. When you navigate to openoffice.org as a user, you see the home page as you should. If you change the user agent to Googlebot (Vanessa used the <a href="https://addons.mozilla.org/en-US/firefox/addon/59">User Agent Switcher Firefox plugin</a>), you see the same nicely rendered home page.</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2010/03/openoffice-google.jpg" alt="openoffice-google" width="550" height="388" /></p>
<p>However, if you change the user agent to either MSNbot or Yahoo Slurp, you see a 403 access denied error.</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2010/03/openoffice-bing.jpg" alt="openoffice-bing" width="550" height="228" /></p>
<p>You can see this more clearly in the HTTP response from the server (using a tool such as the <a href="https://addons.mozilla.org/en-US/firefox/addon/3829">Live HTTP Headers Firefox plugin</a>). Accessing the page as Googlebot returns the following (shortened for space; note the 304 rather than 200 response simply because Vanessa had visited the page before as Googlebot):</p>
<blockquote>
<pre>Host: www.openoffice.org</pre>
<pre>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">GET / HTTP/1.1</pre>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">Host: openoffice.org</pre>
</pre>
<pre>User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</pre>
<pre>If-Modified-Since: Mon, 29 Mar 2010 12:58:51 GMT</pre>
<pre>HTTP/1.0 304 Not Modified</pre>
</blockquote>
<p>Whereas accessing the page as MSNbot looks like this:</p>
<blockquote>
<pre>http://www.openoffice.org/</pre>
<pre>GET / HTTP/1.1</pre>
<pre>Host: openoffice.org</pre>
<pre>User-Agent: msnbot/1.1 (+http://search.msn.com/msnbot.htm)</pre>
<pre>HTTP/1.0 403 Forbidden</pre>
</blockquote>
<p>How did the server get set up this way? Any number of explanations are possible. Sometimes this happens when the host notices overactive crawling from particular bots and blocks them. This is always something that a site owner who uses shared hosting should watch out for (as the result is that your site gets dropped from that search engine&#8217;s index). In this case, Open Office likely manages their own servers, but they may not be blocking Microsoft and Yahoo purposely. A piece of server software could have easily been misconfigured accidentally.</p>
<p>A Microsoft spokesperson tells us: &#8220;We&#8217;re reaching out to them now to try and resolve the issue.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/openofficeorg-mia-in-bing-but-not-censorship-39004/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.409 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-02-10 00:50:41 -->
<!-- Compression = gzip -->
