<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Engine Land &#187; SEO: Blocking Spiders</title>
	<atom:link href="http://searchengineland.com/library/seo/seo-blocking-spiders/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: News On Search Engines, Search Engine Optimization (SEO) &#38; Search Engine Marketing (SEM)</description>
	<lastBuildDate>Fri, 25 May 2012 23:34:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>The Latest &amp; Greatest On SEO Pagination</title>
		<link>http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284</link>
		<comments>http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284#comments</comments>
		<pubDate>Mon, 19 Mar 2012 13:30:02 +0000</pubDate>
		<dc:creator>Adam Audette</dc:creator>
				<category><![CDATA[Enterprise SEO]]></category>
		<category><![CDATA[SEO - Search Engine Optimization]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[SEO: Redirects & Moving Sites]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=114284</guid>
		<description><![CDATA[Technical SEO topics such as pagination are near and dear to my heart. This article will build upon and update my previous treatment of pagination and SEO. I&#8217;ve written and presented often on pagination for SEO. Why so much attention on this subject? The reason is simple: it can be a big, hairy deal for [...]]]></description>
			<content:encoded><![CDATA[<p>Technical SEO topics such as pagination are near and dear to my heart. This article will build upon and update my <a href="http://searchengineland.com/five-step-strategy-for-solving-seo-pagination-problems-95494">previous treatment of pagination and SEO</a>.</p>
<p>I&#8217;ve written and presented often on pagination for SEO. Why so much attention on this subject?</p>
<p>The reason is simple: it can be a big, hairy deal for sites. It&#8217;s right up there with faceted navigation as one of the most problematic crawling and indexing issues for large-scale SEO. It&#8217;s a tactic (actually a set of tactics) that our teams are continually evolving, testing, and refining.</p>
<p>So it was &#8220;<a href="http://www.youtube.com/watch?v=m4BKvNlnPQM">double prizes</a>&#8221; when Google announced the HTML 5 element <a href="http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html">rel next/prev for pagination</a>.</p>
<h2>3 Overall Tactics For SEO Pagination</h2>
<p>There are three primary tactics that we use for SEO pagination:</p>
<ul>
<li>Classic Method (using noindex)</li>
<li>View All Method</li>
<li>Rel Prev/Next Method</li>
</ul>
<p>Each of these is detailed below.</p>
<h2>Classic Pagination for SEO: Using noindex</h2>
<p>I&#8217;ve already <a href="http://searchengineland.com/five-step-strategy-for-solving-seo-pagination-problems-95494">detailed this technique in full</a>, so I&#8217;ll skip the nitty gritty. The important thing to realize is that using this method does not directly transfer any equity from a series of component pages to the primary, canonical page. Rather, as component pages get crawled and link back to the canonical page, that equity is (hopefully) transferred as a second-order effect.</p>
<p>We would generally not recommend using this method for pagination today, except for fringe cases. It&#8217;s perfectly fine and will not hurt a site; on the contrary, it will greatly help a site that has SEO pagination problems. But, there are now even better methods as we&#8217;ll discover.</p>
<div id="attachment_114294" class="wp-caption aligncenter" style="width: 310px"><a href="http://searchengineland.com/figz/wp-content/seloads/2012/03/Mar-09-2012_10.54.44-CapturFiles.png"><img class="size-medium wp-image-114294  " src="http://searchengineland.com/figz/wp-content/seloads/2012/03/Mar-09-2012_10.54.44-CapturFiles-300x224.png" alt="Classic SEO pagination using noindex" width="300" height="224" /></a><p class="wp-caption-text">The classic SEO pagination method uses noindex but does not directly consolidate equity.</p></div>
<h2>View All Method</h2>
<p>The most elegant method is to utilize a View All page. In this approach, all component pages rel canonical back to the View All.</p>
<p>There are a few requirements for this approach:</p>
<ul>
<li>The View All must load quickly; at least 3 seconds end-to-end. <a href="http://maileohye.com/">Maile Ohye</a> pointed out at SMX West that even if load times are excessive, if the page can load progressively the user experience will not suffer as much (since content will be viewable on the page immediately).</li>
</ul>
<p>At SMX West, a few folks complained when I mentioned 3.5 seconds as the maximum load time tolerable for View All pages. The truth is, this is a &#8220;real world&#8221; goal and while not ideal, reflects the actual load times that we see on large sites.</p>
<p>Just take a look at these &#8216;last mile&#8217; <a href="http://www.gomez.com/us-retail-last-mile">load times on US retail sites</a> to get an idea of what latency looks like out there. It&#8217;s not particularly pretty, but more than anything demonstrates the opportunity these sites have.</p>
<ul>
<li>
<div id="attachment_114296" class="wp-caption alignright" style="width: 310px"><img class="size-medium wp-image-114296  " style="margin: 10px;" src="http://searchengineland.com/figz/wp-content/seloads/2012/03/Mar-09-2012_10.59.25-CapturFiles-300x145.png" alt="Site latency reports from Google Webmaster Tools" width="300" height="145" /><p class="wp-caption-text">Site latency reports from Google Webmaster Tools</p></div>
<p>Our analysis of 20 top ecommerce clients showed an average load time of just over 4 seconds. The fastest site was averaging 2 second load times, an exceptional result in this set. But it was more common to see load times above 3 seconds and well into the 4 second range. While the average load time was 4.2 seconds, the slowest site loaded in over 9 seconds!</li>
</ul>
<p>Another requirement for the View All method is to ensure all products, or items, that are included on the component pages are featured on the View All itself.</p>
<p>This ensures that there won&#8217;t be anything left out of the crawl, as pages annotated with rel canonical tags will not necessarily have links within their HTML crawled. It will also ensure there is a relevant match between what is being folded together in the paginated series.</p>
<div id="attachment_114297" class="wp-caption aligncenter" style="width: 310px"><a href="http://searchengineland.com/figz/wp-content/seloads/2012/03/Mar-09-2012_10.55.22-CapturFiles.png"><img class="size-medium wp-image-114297 " src="http://searchengineland.com/figz/wp-content/seloads/2012/03/Mar-09-2012_10.55.22-CapturFiles-300x223.png" alt="The View All method passes equity well" width="300" height="223" /></a><p class="wp-caption-text">View All method elegantly passes link equity to the canonical</p></div>
<p>The benefits of this approach are two-fold:</p>
<ol>
<li>Users tend to love view all pages. In our experience and testing, pages with a lot of products or items all featured at once convert much higher than landing pages with a smaller selection of products. But the pages need to be fast.</li>
<li>All component pages in the series transfer their equity to the View All in a fairly direct fashion.</li>
</ol>
<p>Also something to be aware of: Google will attempt to use your View All page by default, all things considered, when there are no other proactive signals in place. Be aware of this and take steps to control the SEO experience proactively yourself.</p>
<h2>The Rel Next/Prev Method</h2>
<p>The most current technique for SEO pagination makes use of the <a href="http://www.w3.org/TR/html4/struct/links.html">HTML 4/5 link element</a> rel=&#8221;next&#8221; and rel=&#8221;prev&#8221;. The specifics of this implementation are well detailed in <a href="http://support.google.com/webmasters/bin/answer.py?hl=en&amp;answer=1663744">this Google support page</a>, so let&#8217;s focus on the benefits and results.</p>
<p>It&#8217;s been our experience (especially with e-commerce clients) that it can be difficult to get a View All implemented as the canonical and default page. Merchandising teams don&#8217;t always like them; they don&#8217;t make holiday or seasonal specials as easy to manage; advanced landing pages can be better looking and UX and content teams often prefer them; they can make spotlighting certain products more difficult; and many other reasons.</p>
<p>Because of these challenges, rel next/prev is often an excellent method for handling pagination.</p>
<p>The benefits of this approach are as follows:</p>
<ol>
<li>All component pages share their equity with the series. What does this mean? Basically, when page 9 of a series gets a link with rich anchor text, that equity is shared across the series with all the other pages. That&#8217;s a good thing.</li>
<li>However, using rel next/prev doesn&#8217;t prevent a component page from displaying in search results. So while these pages will &#8220;roll up&#8221; to the canonical (or default) page 1, they could still fire at search time if the query was relevant for that specific page. At SMX West, Maile assured us that it would be a very rare thing for that situation to occur. But it could occur.</li>
<li>Because of this, an additional recommendation (strictly as an optional step) is to add a robots noindex, follow to the rel prev/next component pages. This would ensure that component pages would never fire at search time.</li>
<li>Finally, all rel next/prev pages should also have a self-referencing rel canonical tag. In cases where tracking IDs are appended to a URL, these rel canonical tags will ensure no duplication and equity leak occurs.</li>
</ol>
<div id="attachment_114300" class="wp-caption aligncenter" style="width: 310px"><a href="http://searchengineland.com/figz/wp-content/seloads/2012/03/Mar-09-2012_11.14.51-CapturFiles.png"><img class="size-medium wp-image-114300 " src="http://searchengineland.com/figz/wp-content/seloads/2012/03/Mar-09-2012_11.14.51-CapturFiles-300x224.png" alt="Requirements for rel next/prev pagination" width="300" height="224" /></a><p class="wp-caption-text">Ensure implementation of rel next/prev is thorough</p></div>
<h2>Conclusion &amp; Pagination Recommendations</h2>
<p>SEO pagination needs to be recommended situationally (like so much of SEO). Here are my recommendations:</p>
<ol>
<li>If you have a fast loading View All page, and that page contains all the products and/or items included across the component pages, use this method. All component pages rel canonical to the View All, and it becomes your default ranking page in SERPs. It&#8217;s elegant, simple, and efficient. It will also best pass equity from each page to a single, canonical URL.</li>
<p><BR></p>
<li>If you don&#8217;t have a quality View All, or your company doesn&#8217;t want to use that as the canonical URL, implement the rel next/prev methodology instead. This method will consolidate signals across the series, rather than concentrate them on a single URL; however, the end result should be the same, if implemented well: the canonical, ranking URL (normally page 1) will be given the equity. There is a substantial benefit in using this method over the classic noindex approach: equity is actually transferred to the series itself. <BR><BR>Remember, the classic method does not directly pass any equity &#8211; there are no signals to do so &#8211; rather it achieves the same ends by opening up the crawl of component pages and keeping them out of the index and from competing with the ranking URL. Be aware that with rel next/prev, component pages can still fire at search time (although unlikely). You can optionally use a noindex, follow as well to avoid this. Ensure all pages have self-referencing rel canonical tags.</li>
<p><BR></p>
<li>There are edge cases where the classic noindex method of SEO pagination is still viable. These are in situations, for example, where it&#8217;s important to address Bing consistently along with Google (Bing does not yet support rel next/prev), or when HTML 4/5 elements are not yet ready to be deployed at an organization. In cases like these, the classic noindex method is still a good option.</li>
<p>
</ol>
<p>No doubt this will change again, but here&#8217;s the latest for your SEO campaigns. Best of luck and please let me know in the comments your experiences and insights.</p>
<p><strong>Updates</strong>: Google&#8217;s Maile Ohye has recently published a <a href="http://www.youtube.com/watch?v=njn8uXTWiGg">video on pagination and SEO</a>. Be sure to check it out. Vanessa Fox also covers the details in her thorough treatment of the topic, <a href="http://searchengineland.com/implementing-pagination-attributes-correctly-for-google-114970">Implementing Pagination Attributes Correctly for Google</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Google Slows Web Crawlers To Help Blackouts Sites</title>
		<link>http://searchengineland.com/google-slows-web-crawlers-to-help-blackouts-sites-108477</link>
		<comments>http://searchengineland.com/google-slows-web-crawlers-to-help-blackouts-sites-108477#comments</comments>
		<pubDate>Wed, 18 Jan 2012 14:10:04 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=108477</guid>
		<description><![CDATA[As you know, there are many sites going black to protest SOPA and PIPA. Google has already offered blackout SEO advice but they decided to take it one step further by slowing down their spiders today. Pierre Far from Google posted on his Google+ that Google is slowing down GoogleBot&#8217;s crawl activity to reduce the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-95628" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/Google-Webmaster-SEO-Rep-1304428070.gif" alt="Google-Webmaster-SEO-Rep-1304428070" width="167" height="141" />As you know, there are many sites <a href="http://marketingland.com/why-the-web-is-going-dark-over-sopa-pipa-3608">going black to protest SOPA and PIPA</a>. Google has already offered <a href="http://searchengineland.com/blackout-your-site-without-hurting-seo-108302">blackout SEO advice</a> but they decided to take it one step further by slowing down their spiders today.</p>
<p>Pierre Far from Google posted on his <a href="https://plus.google.com/u/0/115984868678744352358/posts/iUN5MGJxEh9">Google+</a> that Google is slowing down GoogleBot&#8217;s crawl activity to reduce the effect on their site&#8217;s search rankings, if they did not follow the Google SEO advice from yesterday.</p>
<p>Pierre Far said:</p>
<blockquote>Hello webmasters! We realize many webmasters are concerned about the medium-term effects of today&#8217;s blackout. As a precaution, the crawl team at Google has configured Googlebot to crawl at a much lower rate for today only so that the Google results of websites participating in the blackout are less likely to be affected.</blockquote>
<h2>Related Articles</h2>
<ul>
<li><a href="http://marketingland.com/what-all-marketers-need-to-know-about-sopa-1677">What All Marketers Need To Know About SOPA – The Stop Online Piracy Act</a></li>
<li><a title="#BlackoutSOPA: A Look At The Social Media Movement That Helped Stall The SOPA Legislation" href="http://marketingland.com/blackoutsopa-a-look-at-the-social-media-movement-that-helped-stall-the-sopa-legislation-3453" rel="bookmark">#BlackoutSOPA: A Look At The Social Media Movement That Helped Stall The SOPA Legislation</a></li>
<li><a href="http://searchengineland.com/blackout-your-site-without-hurting-seo-108302">How To Blackout Your Site (For SOPA/PIPA) Without Hurting SEO</a></li>
<li><a href="http://searchengineland.com/google-blackens-logo-to-protest-sopa-pipa-108436">Google Blackens Its Logo To Protest SOPA/PIPA, While Bing &amp; Yahoo Carry On As Usual</a></li>
<li><a href="http://marketingland.com/why-the-web-is-going-dark-over-sopa-pipa-3608">Why The Web Is Going Dark Over SOPA &amp; PIPA</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-slows-web-crawlers-to-help-blackouts-sites-108477/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How To Blackout Your Site (For SOPA/PIPA) Without Hurting SEO</title>
		<link>http://searchengineland.com/blackout-your-site-without-hurting-seo-108302</link>
		<comments>http://searchengineland.com/blackout-your-site-without-hurting-seo-108302#comments</comments>
		<pubDate>Mon, 16 Jan 2012 19:39:18 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=108302</guid>
		<description><![CDATA[A number of websites are (or were) planning to &#8220;go black&#8221; this week while the U.S. Congress discusses issues related to the Stop Online Piracy Act (SOPA) and the Protect IP Act (PIPA). The website blackouts are part of a larger social media effort against the bills that our Greg Finn wrote about this morning [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-95628" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/Google-Webmaster-SEO-Rep-1304428070.gif" alt="Google-Webmaster-SEO-Rep-1304428070" width="167" height="141" />A number of websites are (or were) planning to &#8220;go black&#8221; this week while the U.S. Congress discusses issues related to the Stop Online Piracy Act (SOPA) and the Protect IP Act (PIPA). The website blackouts are part of a larger social media effort against the bills that our Greg Finn <a href="http://marketingland.com/blackoutsopa-a-look-at-the-social-media-movement-that-helped-stall-the-sopa-legislation-3453">wrote about this morning</a> on Marketing Land.</p>
<p>You may be thinking about joining the website blackout movement, but yikes &#8230; what about the SEO implications? How do you take your site offline in protest without messing up your visibility in Google&#8217;s search results?</p>
<p>Well, Google&#8217;s Pierre Far <a href="https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB">shared several tips</a> earlier today on Google+ in a post called &#8220;Website outages and blackouts the right way.&#8221;</p>
<p>In short, the advice is to use a 503 HTTP status code to tell spiders that the website is temporary unavilable. With a 503 status, Google won&#8217;t index the content (or lack thereof if you&#8217;re blacking out your site) and it won&#8217;t consider the site as having duplicate content issues (when all of the pages are blacked out).</p>
<p>But Far adds a couple important caveats to this advice regarding the robots.txt file and what will happen in Webmaster Tools if Google finds your site blacked out. Another Googler, John Mueller, adds additional information in the comments, so you&#8217;ll want to <a href="https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB">read the original Google+ post</a> if you&#8217;re thinking about blacking out your website this week for SOPA, or in the future for any other reason.</p>
<p>Of course, also keep in mind that Bing may not handle things the same way if you do blackout your site.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/blackout-your-site-without-hurting-seo-108302/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Google Can Now Execute AJAX &amp; JavaScript For Indexing</title>
		<link>http://searchengineland.com/google-can-now-execute-ajax-javascript-for-indexing-99518</link>
		<comments>http://searchengineland.com/google-can-now-execute-ajax-javascript-for-indexing-99518#comments</comments>
		<pubDate>Tue, 01 Nov 2011 17:57:26 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[SEO: Flash]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=99518</guid>
		<description><![CDATA[This morning we reported that the comments on Facebook are being indexed by Google. Google&#8217;s Matt Cutts just confirmed on Twitter that Google is now able to &#8220;execute AJAX/JS to index some dynamic comments.&#8221; This gives Google&#8217;s spider, GoogleBot, the ability to read comments in AJAX or JavaScript, such as Facebook comments or Disqus comments [...]]]></description>
			<content:encoded><![CDATA[<p>This morning we reported that the <a href="http://searchengineland.com/many-facebook-comments-now-being-indexed-by-google-99399">comments on Facebook are being indexed</a> by Google. Google&#8217;s Matt Cutts just <a href="https://twitter.com/#!/mattcutts/status/131425949597179904">confirmed on Twitter</a> that Google is now able to &#8220;execute AJAX/JS to index some dynamic comments.&#8221;</p>
<p>This gives Google&#8217;s spider, GoogleBot, the ability to read comments in AJAX or JavaScript, such as Facebook comments or Disqus comments and others that are dynamically loaded via AJAX or JavaScript. In addition, this means, Google is better at seeing the content behind more of your JavaScript or AJAX.</p>
<p><img class="alignnone size-full wp-image-99520" title="cutts-google-index-ajax" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/cutts-google-index-ajax.png" alt="" width="545" height="228" /></p>
<p><strong>Postscript:</strong> Google now has an official blog <a href="http://googlewebmastercentral.blogspot.com/2011/11/get-post-and-safely-surfacing-more-of.html">post</a> up with more details.</p>
<h3>Related Stories:</h3>
<ul>
<li><a href="http://searchengineland.com/google-proposes-to-make-ajax-crawlable-27408">Google Offers A Proposal To Make AJAX Crawlable</a></li>
<li><a href="http://searchengineland.com/google-offers-seo-advice-on-ajax-coding-12637">Google Offers SEO Advice On AJAX Coding</a></li>
<li><a href="http://searchengineland.com/googles-proposal-for-crawling-ajax-may-be-live-34411">Google May Be Crawling AJAX Now – How To Best Take Advantage Of It</a></li>
<li><a href="http://searchengineland.com/its-official-googles-proposal-for-crawling-ajax-urls-is-live-37298">It’s Official: Google’s Proposal For Crawling AJAX URLs is Live</a></li>
<li><a href="http://searchengineland.com/an-update-on-javascript-menus-and-seo-16060">An Update On Javascript Menus And SEO</a></li>
<li><a href="http://searchengineland.com/google-now-crawling-and-indexing-flash-content-14299">Google Now Crawling And Indexing Flash Content</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-can-now-execute-ajax-javascript-for-indexing-99518/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Google Disables URL Removals After Bug Allows Anyone To Remove Any Site</title>
		<link>http://searchengineland.com/google-disables-url-removals-after-bug-allows-anyone-to-remove-any-site-86352</link>
		<comments>http://searchengineland.com/google-disables-url-removals-after-bug-allows-anyone-to-remove-any-site-86352#comments</comments>
		<pubDate>Tue, 19 Jul 2011 19:10:28 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Webmaster Central]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[SEO: Redirects & Moving Sites]]></category>
		<category><![CDATA[SEO: Spamming]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=86352</guid>
		<description><![CDATA[This morning, James Breckenridge discovered a loophole within Google&#8217;s Webmaster Tools that allowed anyone to remove any site from Google. Both James and I sent this information to Google as soon as we heard of it. After several hours, Google has told us, &#8220;we&#8217;re still investigating this report, and to be cautious we disabled all [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://searchengineland.com/figz/wp-content/seloads/2011/07/Google-Webmaster.gif" alt="" title="Google Webmaster" width="167" height="141" class="alignright size-full wp-image-86354" />This morning, James Breckenridge <a href="http://www.jamesbreckenridge.co.uk/remove-any-site-from-google-even-if-you-dont-control-it.html">discovered</a> a loophole within Google&#8217;s Webmaster Tools that allowed anyone to remove any site from Google.</p>
<p>Both James and I sent this information to Google as soon as we heard of it.  After several hours, Google has told us, &#8220;we&#8217;re still investigating this report, and to be cautious we disabled all URL removals earlier this morning.&#8221;  So now, if you even own a site, you won&#8217;t be able to remove the site or pages from the site using Google&#8217;s <A href="http://www.google.com/support/webmasters/bin/answer.py?answer=164734">URL removal tool</a>.</p>
<p>How did this loophole work?  Pretty simple as James described. You use the following URL when logged into Google Webmaster Tools:</p>
<blockquote>https://www.google.com/webmasters/tools/removals-request?hl=en&#038;siteUrl=http://{YOUR_URL}/&#038;urlt={URL_TO_BLOCK}</blockquote>
<p>Then replace {YOUR_URL} with a URL you control within Webmaster Tools, and replace {URL_TO_BLOCK} with the URL of the site you want to block.  </p>
<p>You could block a whole site, section or single page this way, based on how you entered the URL.  To block a site, use the top level domain (E.g. http://www.someurl.com/), to block a section (subfolder) use a subfolder URL (E.g. http://www.someurl.com/somefolder/) and to block a page use the specific page URL  (E.g. http://www.someurl.com/somefolder/somepage.html).</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2011/07/Screen-shot-2011-07-19-at-3.02.29-PM-600x205.png" alt="" title="Screen shot 2011-07-19 at 3.02.29 PM" width="600" height="205" class="alignnone size-large wp-image-86355" /></p>
<p>I am waiting an update from Google on why this happened, if site&#8217;s were impacted and how long this was an issue.</p>
<p><strong>Postscript:</strong>: Google sent us a statement that they have fixed the issue.  A Google spokesperson said:</p>
<blockquote>We&#8217;ve confirmed that there was an issue within the URL removal feature in our Webmaster Tools and have already pushed out a fix and re-enabled URL removals. </p>
<p>The URL removal feature keeps detailed records, so we&#8217;re currently reprocessing earlier removal requests to ensure their validity. Our initial examination has shown only a limited impact.</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-disables-url-removals-after-bug-allows-anyone-to-remove-any-site-86352/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Google Webmaster Tools Remove URL With Blocking Not Required</title>
		<link>http://searchengineland.com/google-webmaster-tools-remove-url-with-blocking-not-required-77515</link>
		<comments>http://searchengineland.com/google-webmaster-tools-remove-url-with-blocking-not-required-77515#comments</comments>
		<pubDate>Tue, 17 May 2011 16:37:47 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Web Search]]></category>
		<category><![CDATA[Google: Webmaster Central]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=77515</guid>
		<description><![CDATA[Google announced on the Webmaster blog that they have removed a requirement for removing URLs via Google Webmaster Tools. Google no longer requires you to block access to the URL you want to remove from Google prior to submitted the URL removal request. Google said they have eliminated &#8220;the requirement that the webpage&#8217;s URL must [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://searchengineland.com/figz/wp-content/seloads/2011/05/url-removal-step-1-300x187.png" alt="" title="url-removal-step-1" width="300" height="187" class="alignright size-medium wp-image-77517" />Google <A href="http://googlewebmastercentral.blogspot.com/2011/05/easier-url-removals-for-site-owners.html">announced</a> on the Webmaster blog that they have removed a requirement for <a href="http://searchengineland.com/removing-pages-from-google-53086">removing URLs</a> via <a href="https://www.google.com/webmasters/tools/">Google Webmaster Tools</a>.</p>
<p>Google no longer requires you to block access to the URL you want to remove from Google prior to submitted the URL removal request.  Google said they have eliminated &#8220;the requirement that the webpage&#8217;s URL must first be blocked by a site owner before the page can be removed from Google&#8217;s search results.&#8221;  Why did Google drop this requirement?  Simply because since you have already validated and verified you are the owner of the site, Google felt it was redundant to require you to block the URL to prove again you are the site owner.  Google explained:</p>
<blockquote>You&#8217;ve already verified ownership of the site, we can eliminate this requirement to make it easier for you, as the site owner, to remove unwanted pages (e.g. pages accidentally made public) from Google&#8217;s search results.</blockquote>
<p>Please note, these URL removals are only temporary and last 90-days.  If you want the URLs and pages to never show in Google, you need to permanently remove them by 404ing the page or  blocking them via robots.txt file or a noindex meta tag.</p>
<p><strong>Related Stories:</strong></p>
<ul>
<li><a href="http://searchengineland.com/removing-pages-from-google-53086">Removing Pages From Google: A Comprehensive Guide For Content Owners</a></li>
<li><a href="http://searchengineland.com/google-releases-improved-content-removal-tools-10989">Google Releases Improved Content Removal Tools</a></li>
<li><a href="http://searchengineland.com/google-lets-you-tell-them-which-url-parameters-to-ignore-25925">Google Lets You Tell Them Which URL Parameters To Ignore</a></li>
<li><a href="http://searchengineland.com/up-close-personal-with-robotstxt-10978">Up Close &amp; Personal With Robots.txt</a></li>
<li><a href="http://searchengineland.com/removing-your-personal-information-from-google-55014">Removing Your Personal Information From Google</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-webmaster-tools-remove-url-with-blocking-not-required-77515/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Lesson From the Indexing of Google Translate: Blocking Search Results From Search Results</title>
		<link>http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529</link>
		<comments>http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529#comments</comments>
		<pubDate>Wed, 26 Jan 2011 22:37:24 +0000</pubDate>
		<dc:creator>Vanessa Fox</dc:creator>
				<category><![CDATA[Features: Analysis]]></category>
		<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[How To: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=62529</guid>
		<description><![CDATA[Last year, Google published an SEO Report Card of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google&#8217;s Matt Cutts presented the results at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers [...]]]></description>
			<content:encoded><![CDATA[<p>Last year, <a href="http://googlewebmastercentral.blogspot.com/2010/03/googles-seo-report-card.html">Google published an SEO Report Card</a> of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google&#8217;s Matt Cutts <a href="http://www.youtube.com/watch?v=x1iuqHdNfGo">presented the results</a> at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers who don&#8217;t work in search don&#8217;t get preferential treatment from those who do and just like any site on the internet, sometimes things aren&#8217;t implemented correctly. Just because a site is owned by Google doesn&#8217;t mean it&#8217;s the best example of what to do in terms of SEO.</p>
<p>This morning Rishi Lakhani <a href="http://twitter.com/rishil/status/30259383208251392">tweeted about Google Translate pages</a> appearing in Google search results. As you can see in the example below, pages with individual translation requests have been indexed.</p>
<p style="text-align: center;"><a rel="attachment wp-att-62544" href="http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529/translate1"><img class="aligncenter size-large wp-image-62544" style="border: 1px solid black;" title="Google Translate Search Results" src="http://searchengineland.com/figz/wp-content/seloads/2011/01/translate1-500x265.png" alt="Google Translate Search Results" width="500" height="265" /></a></p>
<p>All of the URLs that include a parameter seem to be individual translations. For instance, http://translate.google.com/?q=ART# displays as follows:</p>
<p style="text-align: center;"><a rel="attachment wp-att-62546" href="http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529/translate2"><img class="aligncenter size-large wp-image-62546" style="border: 1px solid black;" title="Google Translate Example" src="http://searchengineland.com/figz/wp-content/seloads/2011/01/translate2-500x200.png" alt="Google Translate Example" width="500" height="200" /></a></p>
<p>The problems with these types of pages being indexed in search engines is twofold:</p>
<ul>
<li>The <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35769">Google webmaster guideline</a>s say that Google <a href="http://www.mattcutts.com/blog/search-results-in-search-results/">doesn&#8217;t want to show search results in its search results</a> and recommends that content owners block search results on their site from being indexed using robots.txt or a meta robots tag.</li>
<li>That same guideline recommends blocking autogenerated pages from being indexed and a Google Webmaster Central blog a few months ago <a href="http://googlewebmastercentral.blogspot.com/2010/09/unifying-content-under-multilingual.html">provided recommendations for handling machine-translated text</a> so that it didn&#8217;t appear in search results.</li>
</ul>
<p>A site owner might also want to block these types of pages from being crawled and indexed to increase crawl efficiency and ensure the most valuable pages on the site are being crawled and indexed instead.</p>
<p>I asked Google about this and they confirmed that indeed it was simply a matter of the Google Translate team not being aware of the issue and said they would resolve it.</p>
<h2>Blocking Autogenerated Search Pages From Being Indexed</h2>
<p>In the case of Google Translate, the ideal scenario is that the <a href="http://translate.google.com/#">main page</a> and any secondary pages (such as <a href="http://translate.google.com/translate_tools">this tools page</a>) be indexed, but that any pages from translation requests not be indexed.</p>
<h3>Using robots.txt</h3>
<p>The best way to do this would be to add a <a href="http://code.google.com/web/controlcrawlindex/docs/robots_txt.html">disallow line in the robots.txt file</a> for the site that blocks indexing based on a pattern match of the URL query parameter. For instance:</p>
<pre>Disallow: /*q=</pre>
<p>This pattern would prevent search engines from indexing any URLs containing q=. (The * before the q= means that the q= can appear anywhere in the URL.)</p>
<p>In the case of translate.google.com (and all related TLDs), the robots.txt file that exists for the subdomains seems to be copied from www.google.com. Remember that search engines obey the robots.txt file for each subomain separately. Using the same robots.txt file for a subdomain that&#8217;s used for the www variation of the domain could have unintended consequences because the subomain likely has an entirely different folder and URL structure. (You can always check the behavior of your robots.txt file using Google Webmaster Tools.)</p>
<p>Adding the disallow pattern shown above to the www.google.com/robots.txt file would not work as search engines wouldn&#8217;t check that file when crawling the translate subdomain and in would instead cause search engines not to index URLs that match the pattern on www.google.com.</p>
<p>translate.google.com (and all google.com subdomains should have their own robots.txt file that&#8217;s customized for that subdomain.</p>
<h3>Using the meta robots tag</h3>
<p>If Google isn&#8217;t able to create a separate robots.txt file for the translate subdomain, they should first remove the file that&#8217;s there (and from other subdomains as well, as it could be causing unexpected indexing results for those subdomains). Then, they should use the meta robots tag on the individual pages they want blocked. Since the pages in question are dynamically generated, the way to do this would be to add logic to the code that generates these pages that writes the robots meta tag to the page as its created. This tag belongs in the &lt;head&gt; section of the page and looks as follows:</p>
<pre>&lt;meta="robots" content="noindex"&gt;</pre>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/a-lesson-from-the-indexing-of-google-translate-blocking-search-results-from-search-results-62529/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Releases Details On Controlling GoogleBot &amp; Google&#8217;s Crawl</title>
		<link>http://searchengineland.com/google-releases-details-on-controlling-googlebot-googles-crawl-56946</link>
		<comments>http://searchengineland.com/google-releases-details-on-controlling-googlebot-googles-crawl-56946#comments</comments>
		<pubDate>Fri, 26 Nov 2010 13:47:09 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=56946</guid>
		<description><![CDATA[The Google Webmaster Central blog announced Google has published a new set of documents in the Google Code section on how to control Google from crawling and indexing your site. You can read the set of documents over here. The technical documents are broken down into five section: Getting Started Robots.txt specification Robots meta tag [...]]]></description>
			<content:encoded><![CDATA[<p>The Google Webmaster Central blog <a href="http://googlewebmastercentral.blogspot.com/2010/11/controlling-crawling-and-indexing-now.html">announced</a> Google has published a new set of documents in the Google Code section on how to control Google from crawling and indexing your site.  You can read the set of documents <a href="http://code.google.com/web/controlcrawlindex/">over here</a>.</p>
<p>The technical documents are broken down into five section:</p>
<ul>
<li><A href="http://code.google.com/web/controlcrawlindex/docs/getting_started.html">Getting Started</a></li>
<li><a href="http://code.google.com/web/controlcrawlindex/docs/robots_txt.html">Robots.txt specification</A></li>
<li><A href="http://code.google.com/web/controlcrawlindex/docs/robots_meta_tag.html">Robots meta tag and X-Robots-Tag specification</a></li>
<li><a href="http://code.google.com/web/controlcrawlindex/docs/crawlers.html">Google&#8217;s crawlers</a></li>
<li><A href="http://code.google.com/web/controlcrawlindex/docs/references.html">References</a></li>
</ul>
<p>I personally printed them out as my weekend reading.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-releases-details-on-controlling-googlebot-googles-crawl-56946/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Robots.txt Recruiter: Daily Mail Uses Robots.txt File To Find SEO</title>
		<link>http://searchengineland.com/robots-txt-recruiter-daily-mail-uses-robots-txt-file-to-find-seo-49191</link>
		<comments>http://searchengineland.com/robots-txt-recruiter-daily-mail-uses-robots-txt-file-to-find-seo-49191#comments</comments>
		<pubDate>Tue, 24 Aug 2010 13:20:11 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[SEM Industry: General]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=49191</guid>
		<description><![CDATA[Malcolm Coles spotted that the Daily Mail, one of the UK&#8217;s largest papers, changed their robots.txt file to include a line which reads: # August 12th, MailOnline are looking for a talented SEO Manager so if you found this then you&#8217;re the kind of techie we need!# Send your CV to holly dot ward at [...]]]></description>
			<content:encoded><![CDATA[<p>Malcolm Coles <a href="http://www.malcolmcoles.co.uk/blog/seo-job-mail-robots/">spotted</a> that the <a href="http://www.dailymail.co.uk/">Daily Mail</a>, one of the UK&#8217;s largest papers, changed their <a href="http://www.dailymail.co.uk/robots.txt">robots.txt file</a> to include a line which reads:</p>
<blockquote># August 12th, MailOnline are looking for a talented SEO Manager so if you found this then you&#8217;re the kind of techie we need!<br /># Send your CV to holly dot ward at mailonline dot co dot uk</blockquote>
<p>How clever! They suspect some of the best SEOs out there would be sniffing around their robots.txt file and used it to recruit a new SEO manger.  If anything, it is getting the word out there via the press that they are looking for a new SEO.</p>
<p>This reminds me of when Google added the <A href="http://searchengineland.com/spooky-search-engines-on-halloween-15321">User-agent: zombies</a> Disallow: /brains to their robots.txt file on Halloween.  Also, in 2006, Brett Tabke used his <A href="http://www.seroundtable.com/archives/006720.html">robots.txt file as his blog</a> for a period of time.</p>
<p>I do not believe anyone every used it to hire an SEO before.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/robots-txt-recruiter-daily-mail-uses-robots-txt-file-to-find-seo-49191/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Facebook: No Plans To Give Search Engines Access To Facebook Questions</title>
		<link>http://searchengineland.com/facebook-questions-no-search-engine-indexing-47671</link>
		<comments>http://searchengineland.com/facebook-questions-no-search-engine-indexing-47671#comments</comments>
		<pubDate>Thu, 29 Jul 2010 21:21:55 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Search Engines: Answer Search Engines]]></category>
		<category><![CDATA[Search Engines: Help Engines]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[Top News]]></category>
		<category><![CDATA[Yahoo: Answers]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=47671</guid>
		<description><![CDATA[That&#8217;s one of the big questions people are asking after yesterday&#8217;s launch of Facebook Questions. While many have assumed the answer would be &#8220;yes,&#8221; a Facebook spokesperson tells us that assumption is wrong. Currently, search engines cannot access questions and answers through our Questions product. That may be something we consider for the future but [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://searchengineland.com/figz/wp-content/seloads/2010/07/facebook-questions.png" alt="facebook-questions" width="545" height="92" class="alignnone size-full wp-image-47672" /></p>
<p>That&#8217;s one of the big questions people are asking after <a href="http://searchengineland.com/facebook-questions-opens-to-limited-public-release-47523">yesterday&#8217;s launch</a> of <a href="http://searchengineland.com/up-close-with-facebook-questions-47567">Facebook Questions</a>. While many have assumed the answer would be &#8220;yes,&#8221; a Facebook spokesperson tells us that assumption is wrong.</p>
<blockquote>Currently, search engines cannot access questions and answers through our Questions product. That may be something we consider for the future but have no current plans to allow it.</blockquote>
<p>Facebook is blocking search engines by only showing Questions to logged-in users. Sure enough, a <a href="http://www.google.com/search?q=site%3Afacebook.com%2Fquestions%2F">site:facebook.com/questions/</a> search on Google shows only a handful of results, none of which are actually Q&#038;A from the first 24 hours that the beta has been open.</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2010/07/goog-fb.png" alt="goog-fb" width="550" height="494" class="alignnone size-full wp-image-47673" /></p>
<p>The same search produces zero results on both <a href="http://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Ffacebook.com%2Fquestions%2F">Yahoo</a> and <a href="http://www.bing.com/search?q=site%3Afacebook.com%2Fquestions%2F">Bing</a>.</p>
<p>Facebook&#8217;s move is unusual. As far back as 2007, the company was starting to <a href="http://searchengineland.com/facebook-opens-profiles-to-tap-into-google-traffic-while-google-grabs-facebooks-news-feed-idea-12096">open up some content to search engines</a> &#8212; a trend that&#8217;s continued more recently with <a href="http://searchengineland.com/liveblogging-googles-web-search-evolution-event-31317">various agreements</a> to let search engines access certain user content. For years, Yahoo Answers has been a pretty formidable <a href="http://www.smallbusinesssem.com/part-two-why-use-yahoo-answers/1063/">rankings powerhouse</a>, and no doubt gets a substantial amount of traffic via search engines. </p>
<p>It&#8217;s odd that Facebook would want to ignore that traffic source entirely. Don&#8217;t be surprised if this policy changes.</p>
<p><strong>Postscript, July 30:</strong> Experian Hitwise has responded to my claim near the end of this article about how much traffic Yahoo Answers gets via search engines. In a <a href="http://twitter.com/Hitwise_US/status/19919086878">tweet this morning</a>, Hitwise reports that &#8220;62% of upstream visits to Yahoo Answers came via Google last week.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/facebook-questions-no-search-engine-indexing-47671/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.332 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-05-25 23:40:10 -->
<!-- Compression = gzip -->
