<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>searchengineland.com &#187; SEO: Duplicate Content</title>
	<atom:link href="http://searchengineland.com/library/seo/seo-duplicate-content/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: Must Read News About Search Marketing &#38; Search Engines</description>
	<lastBuildDate>Sat, 21 Nov 2009 03:30:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Google Lets You Tell Them Which URL Parameters To Ignore</title>
		<link>http://searchengineland.com/google-lets-you-tell-them-which-url-parameters-to-ignore-25925</link>
		<comments>http://searchengineland.com/google-lets-you-tell-them-which-url-parameters-to-ignore-25925#comments</comments>
		<pubDate>Wed, 16 Sep 2009 20:13:28 +0000</pubDate>
		<dc:creator>Vanessa Fox</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Webmaster Central]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=25925</guid>
		<description><![CDATA[A new feature has appeared in the Site Configuration Settings Sections of Google Webmaster Tools. The setting, called Parameter Handling, enables site owners to specify up to 15 parameters that Google should ignore when crawling and indexing the site.
Google lists the parameters they’ve found in the URLs on your site, and indicates whether or not [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-lets-you-tell-them-which-url-parameters-to-ignore-25925"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-lets-you-tell-them-which-url-parameters-to-ignore-25925" height="61" width="51" /></a></div><p>A new feature has appeared in the Site Configuration Settings Sections of Google Webmaster Tools. The setting, called Parameter Handling, enables site owners to specify up to 15 parameters that Google should ignore when crawling and indexing the site.</p>
<p>Google lists the parameters they’ve found in the URLs on your site, and indicates whether or not they think they those parameters are extraneous (with a suggested “Ignore” or “Don’t ignore”. You can confirm or reject those suggestions and can add parameters that aren’t listed.</p>
<p><a title="Google Webmaster Tools Parameter Handling by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/3927048854/"><img src="http://farm3.static.flickr.com/2479/3927048854_bebff649f6.jpg" alt="Google Webmaster Tools Parameter Handling" width="500" height="191" /></a></p>
<p>So what does this mean for site owners?</p>
<p>The primary value of the feature is to improve the canonicalization of a site in Google’s index due to <a href="http://searchengineland.com/understanding-search-engines-duplicate-content-issues-11738">duplicate content</a>. Canonicalization issues occur when multiple URLs load the same content. This scenario can be problematic for a number of reasons (for instance, it can skew analytics data) but from a search perspective, canonicalization issues can cause:</p>
<ul>
<li><strong>Crawl efficiency problems:</strong> if search engine bots crawl the same page via multiple URLs, they may not have resources to crawl as many unique pages on the site</li>
<li><strong>PageRank dilution that can lead to lowered search rankings: </strong>if external sites link to multiple versions of a page, each page has less Page Rank value than if all links were to one version</li>
<li><strong>Display and branding problems: </strong>search engines display only one version of the URL; you ideally want the canonical version of a URL to display (mysite.com/goldfish) rather than a version with extraneous parameters (mysite.com/goldfish?adid=1205123&amp;sid=452006&amp;sort=high-rating&amp;loc=sea)</li>
</ul>
<p>A number of canonicalization solutions exist, including several that are Google-specific, so why did they launch this new feature? Yahoo! has included a <a href="http://searchengineland.com/yahoo-site-explorer-adds-dynamic-url-rewriting-tool-11991">similar feature as part of its Site Explorer</a> webmaster product for some time and site owners have been asking for a similar feature from Google for a while (certainly at least since I was working on Webmaster Central).</p>
<p>Below a rundown of the various canonicalization options and how this one differs.</p>
<p><strong>Google Webmaster Tools Parameter Handling: When URLs Can Contain Optional Parameters</strong></p>
<p>This new option only helps with canonicalization issues that are caused by optional parameters that are in a standard key-value pair format and that you specify. In other words, it can only be an exclusionary list (don’t crawl parameters x,y, and z) rather than inclusionary (only crawl parameters a and b).</p>
<p>Wouldn’t you always know the complete list of potential parameters? Hopefully. But some canonicalization issues happen because a URL can take any parameters at all. Ideally, you want to ensure your server isn’t set up this way, but if you need this configuration (for instance another team or outside agency needs the ability to use any custom tracking code without waiting for that parameter code to be added to the server set up), then you’re better off using the meta canonical tag.</p>
<p>The two most common reasons for optional parameters and that this feature will work well for are:</p>
<ul>
<li>Tracking codes used for analytics data (in this case, you may not want to implement a 301 redirect from the long version of the URL to the canonical one since you could lose the data)</li>
<li>Page layout changes, such as sort orders (in this case, the code on the page uses the parameter to change the layout of the page, but from a search engine perspective the content on each version is the same, just in a different order)</li>
</ul>
<p><strong>Why use this canonicalization option over the others?</strong> The biggest benefit is likely in the increase in crawl efficiency. When Google discovers a new URL, they can check the included parameters against the parameter handling list and remove any optional ones before crawling it (but still credit any found links to the page). This could substantially reduce the crawling overhead on a site and could free up considerable bandwidth for getting other pages of the site crawled.</p>
<p>It’s also fairly simple to use. Just scan the list of suggested parameters and click the ones that are optional. In some organizations, it can be difficult to get source code added to web pages, making the implementation of the canonical tag difficult and time consuming. With this option, if you have verified webmaster tools access, you don’t need to involve IT at all.</p>
<p><strong>What are the drawbacks to this option? </strong>The most obvious issue with this option is that it only works for Google. In time past, you could use this setting and the corresponding one in Yahoo! Site Explorer and not worry about other engines. But with Microsoft Bing’s impending (likely) replacement of Yahoo’s search index, it’s quite possible that Yahoo’s feature will go the way of its index, and if Microsoft doesn’t offer something similar, then a search index with 25%+ market share could be getting your URLs wrong.</p>
<p>You could also shoot yourself in the foot, metaphorically speaking. You could accidentally tell Google to ignore important parameters that, if dropped from the index, could wipe out large portions of your site. As Google adds more of these types of features to webmaster tools, it becomes more important to ensure that anyone who has access to them know what they’re doing.</p>
<p>In reality, Google likely has safeguards in place that at least partially protect against such accidental destruction. That’s undoubtedly why they say that “While Google takes suggestions into account, we don&#8217;t guarantee that we&#8217;ll follow them in every case.” They don’t want large portions of their index disappearing either.</p>
<p>Unlike accidental blocking with robots.txt, which search engines follow as a directive, this feature (and many of the others) is a signal only. If the other signals already in place strongly contradict it (for instance, the content seems to be vastly different), it likely won’t be used.</p>
<p>But even though Google has safeguards like this one in place, you may not want to chance it if you’re not confident of which parameters are really optional (all the time, since this is a site-wide setting).</p>
<p>This option also won&#8217;t work if your canonicalization issues aren&#8217;t related to parameters or if the parameters aren&#8217;t in standard key-value pair format.</p>
<p><strong>Meta canonical attribute</strong></p>
<p>The canonical attribute is a page-level meta tag that specifies the canonical version for the page. This can be useful because no matter what optional parameters are added to the version of the URL that renders the page, search engines can always know the canonical version. You can find detailed information about this tag in my article <a href="http://searchengineland.com/canonical-tag-16537">about its launch</a>.</p>
<p><strong>Why use this canonicalization option over the others?</strong> You just specify the canonical version of a page once, and no matter what parameters are added to the URL, search engines are always provided with the canonical version.</p>
<p>Since this meta data is on the page itself, any search engine can read it, and in fact, Google, Yahoo!, and Microsoft have all announced support for it. As of yet though, only Google seems to be actively using it.</p>
<p><strong>What are the drawbacks to this option? </strong>Unlike with the parameter handling feature, search engines have to crawl the page before they can read the tag, so some crawl efficiency is lost. This tag should promote <em>long-term</em> efficiency, however, since theoretically,  once the bot has crawled the non-canonical version of the URL and read the tag, it shouldn’t have to crawl that version of the URL again.</p>
<p>As already noted, implementation requires modification of the page source code, which isn’t always easy within some organizations.</p>
<p>As with parameter handling, it’s possible to implement this tag incorrectly. For instance, it’s been discovered that some sites have accidentally set the canonical version of every page to the home page. As with the parameter handling feature, search engines consider the tag a “strong hint” as a precaution against these types of mistakes and won’t use the data when it strongly contradicts their other signals. In the case of Google, the only search engines who is actively using the tag so far, this has proven to be the case.</p>
<p><strong>301 redirect</strong></p>
<p>It’s  universally agreed that (other than not have multiple versions of a URL at all) the best way to canonicalize URLs is to redirect all versions to the canonical one using a 301 redirect. This implementation sends all users and search engines to the canonical version and effectively consolidates all links to the page and ensures only the canonical one is indexed and ranked.</p>
<p><strong>Why use this canonicalization option over the others?</strong> It’s understood and followed by all major search engines and it provides the best user experience (visitors have one URL to access, bookmark, and share). In most cases, search engines consolidate all links to the redirect target and rank the canonical one.</p>
<p>This option is the best choice when you are moving content (for instance, changing your URL structure or changing domains) and to indicate whether your want content indexed under the www or non-www version of the domain.</p>
<p>Also keep in mind that if you redirect to the canonical version you’re more likely to get links to the right version, since most visitors will simply copy and paste what they see in the address bar.</p>
<p><strong>What are the drawbacks to this option? </strong>When you are using parameters for sort orders or tracking, a redirect may negate those parameters. ou can generally <a href="http://janeandrobot.com/library/url-referrer-tracking">configure your analytics program</a> to handle this properly, but it probably won’t work out of the box.</p>
<p>Redirects aren’t always properly implemented. For instance, they might inadvertently be implemented as a 302 (or worse yet, a JavaScript redirect or meta refresh). Or they may generate redirect loops or infinite redirect chains. In these cases, search engine bots eventually abandon the crawl attempt (and with Google and Microsoft, you can get a list of these URLs in their webmaster tools products).</p>
<p>Redirects can also slow down crawl efficiency, particularly due to redirect chains. Ideally, search engines crawl the redirect then eventually stop crawling the origination URL, but if the bot encounters links to the original URL, it will continue crawling both versions (or more, if the page has moved multiple times).</p>
<p><strong>Google webmaster tools change address feature</strong></p>
<p>This feature enables you to tell Google when you’re changing domains. You have to verify ownership of both the old domain and the new domain and then you can specify a move from one to the other. You can find more information about this  feature xx.</p>
<p><strong>Why use this canonicalization option over the others?</strong> The best use of this feature is when you are changing domains and you aren’t able to implement a 301 redirect from the old domain to the new. (This is the case, for instance, with blogspot.com sites.) Even if you are able to implement the redirect, it can’t hurt to let Google know as well!</p>
<p><strong>What are the drawbacks to this option? </strong>You can only use this option to move from one domain to the other. And as with the other Google webmaster tools features, it only works for Google.</p>
<p><strong>Google webmaster tools preferred domain feature</strong></p>
<p><a href="http://googlewebmastercentral.blogspot.com/2006/09/setting-preferred-domain.html">The preferred domain feature</a> enables you to tell Google whether your want your domain indexed with the www subdomain or without it.. Since most sites resolve either way, a complete duplicate set of content  of your site will exist if you don’t set www/non-www canonicalization. Why is this a problem? Ideally it’s not and search engines consolidate the content correctly. But often, search engines find links to both versions and end up crawling both, indexing both, and crediting the links to the versions separately.</p>
<p><strong>Why use this canonicalization option over the others?</strong> You may as well always use this option, although you should implement a 301 redirect as well, if you can. Google initially implemented this feature for those sites that weren’t able to do so.</p>
<p><strong>What are the drawbacks to this option? </strong>Again, this option works only for Google. And it doesn’t provide as much of a guarantee as a 301 redirect.</p>
<p><strong>Blocking duplicate content with a robots directive</strong></p>
<p>The traditional advice for avoiding duplicate content has been to block the duplicates with robots.txt (or a robots meta tag) to ensure the correct version is indexed. It can be important that the right version be indexed vs. the version intended for print, for instance.</p>
<p><strong>Why use this canonicalization option over the others?</strong> Generally speaking, you shouldn’t now that the canonical meta tag is available. The scenarios for which you wouldn’t want to redirect (such as the print version example) can be more easily solved with the canonical tag and the scenarios for which you’re worried about crawl efficiency issues that would leave large portions of your site uncrawled (such as large-scale optional parameters) can now more easily be solved with Google’s parameter handling feature.</p>
<p><strong>What are the drawbacks to this option? </strong>The primary drawback to this option is the loss of link credit. Any links to blocked pages fall into a black hole and can’t be credited to the canonical version of the page, as happens with the other options.</p>
<p><strong>The parameter handling feature can also provide insight on how Google sees your site
</strong></p>
<p>For some time, Google has been attempting to canonicalize URLs and show the canonical version in the results, even when a site owner hasn’t implemented any of these canonicalization options. For instance, they may determine that several pages contain the same content and algorithmically consolidate them and associate them with the one Google determines is canonical. They haven’t described exactly how they determine the canonical version, but they might, for instance, choose the URL with the fewest number of parameters or the shortest version of the URL.</p>
<p>Last year, they started letting webmasters know <a href="http://googlewebmastercentral.blogspot.com/2008/08/to-infinity-and-beyond-no.html">when they encountered URLs that they thought were extraneous and were causing crawling problems</a>. It’s likely that Google is using a similar source to generate the list of parameters it suggests should be ignored.</p>
<p>In this way, the parameter handling feature provides insight into how Google perceives the site. If you see many parameters listed that aren’t optional, take a look at the content on the URL that use those parameters.</p>
<p>This could signify a larger problem. It could be that Google doesn’t see enough unique content on them (this can happen, for instance, with pages that list part numbers, contain mostly images and item codes, or list little information outside of a login). You may want to look for ways to differentiate the pages a bit more.</p>
<p>Interestingly, the Google Webmaster Central blog has a new <a href="http://googlewebmastercentral.blogspot.com/2009/09/duplicate-content-and-multiple-site.html">post about duplicate content</a>, but no mention of this new feature. Thanks to Brian Ussery for <a href="http://www.beussery.com/blog/index.php/2009/09/parameter-handling/">pointing it out</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-lets-you-tell-them-which-url-parameters-to-ignore-25925/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Google Loses &#8220;Backwards Compatibility&#8221; On Paid Link Blocking &amp; PageRank Sculpting</title>
		<link>http://searchengineland.com/google-loses-backwards-compatibility-on-paid-link-blocking-pagerank-sculpting-20408</link>
		<comments>http://searchengineland.com/google-loses-backwards-compatibility-on-paid-link-blocking-pagerank-sculpting-20408#comments</comments>
		<pubDate>Wed, 03 Jun 2009 04:39:46 +0000</pubDate>
		<dc:creator>Danny Sullivan</dc:creator>
				<category><![CDATA[Features: Analysis]]></category>
		<category><![CDATA[Link Building: General]]></category>
		<category><![CDATA[Link Building: Paid Links]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[SEO: Spamming]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=20408</guid>
		<description><![CDATA[Imagine that you fired up your computer and found that a bunch of your  programs no longer worked, because behind the scenes, the operating system had  been upgraded without any backwards compatibility. That&#8217;s what happened this  week with Google. Some things that were working just fine now are broken,  because Google [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-loses-backwards-compatibility-on-paid-link-blocking-pagerank-sculpting-20408"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-loses-backwards-compatibility-on-paid-link-blocking-pagerank-sculpting-20408" height="61" width="51" /></a></div><p>Imagine that you fired up your computer and found that a bunch of your  programs no longer worked, because behind the scenes, the operating system had  been upgraded without any backwards compatibility. That&#8217;s what happened this  week with Google. Some things that were working just fine now are broken,  because Google isn&#8217;t being backwards compatible. And that&#8217;s fairly  unprecedented.</p>
<p>Don&#8217;t panic. One of the changes really shouldn&#8217;t hurt many sites, impacting  only a &#8220;power SEO&#8221; technique commonly called PageRank sculpting that I&#8217;d say  fairly few use. The other has a bigger impact and potentially means thousands of  sites may now be violating Google&#8217;s rules on paid link without knowing it. But  that&#8217;s not likely to have an immediate impact. I&#8217;ll explain both changes in more  depth below.</p>
<p>The most important thing is that in both cases, the changes may require site  owners to alter their web sites not because they were &#8220;chasing the algorithm&#8221;  but instead because they were following Google&#8217;s own rules and instructions.  They were doing what was advised, and now they may have to undo that work.</p>
<p>That&#8217;s the unprecedented part. Google has constantly upgraded how it deals  with site content, from early advances like indexing PDF documents to later  changes like showing &#8220;sitelinks&#8221; for web sites. These upgrades have been  generally good and involved little to no work on the part of the site owner, until now.</p>
<p><strong>PageRank Sculpting: Spending A Page&#8217;s Authority Money</strong></p>
<p>Let&#8217;s take <a href="../../sculpting-your-pagerank-for-maximum-seo-impact-12982">PageRank  sculpting</a>. In general, every individual web page that Google finds has some  degree of importance that the page can pass on to other pages &#8212; <a href="../../what-is-google-pagerank-a-guide-for-searchers-webmasters-11068">PageRank</a>.  Links from that page to other pages are how it passes that importance along. And  in its most basic, earliest form, each link on the page equally shared some of  the importance.</p>
<p>Consider it like this. Imagine authority is money, and a particular page has  $10 in &#8220;authority&#8221; to spend. It links out to 10 pages, so each of those pages  gets $1 ($10 divided by 10). If it links to 20 pages, each gets 50 cents ($10  divided by 20). If it links to 5 pages, each page gets $2 (you get the math by  now).</p>
<p>With PageRank sculpting, the idea is to effectively block some of the links  on your web page (using the <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=96569">nofollow attribute</a> or some other means) from getting  any authority. Perhaps you have a lot of navigational links to other pages  inside your web site. Rather than spend authority money on these pages, you  might prefer to spend it on a smaller number of important pages that could use a  boost.</p>
<p><strong>PageRank Sculpting Gets Popular</strong></p>
<p>This technique has been around for ages and had various names until the  middle of 2007. That&#8217;s when it went more mainstream in the advanced SEO space.  And in particular, it went that way I feel because Google spam fighting czar  Matt Cutts talked about how Google&#8217;s YouTube was using PageRank sculpting during  an open discussion at Google with a variety of advanced SEO people about techniques and issues.</p>
<p>I recall it being described as a means to ensure your best pages got the most  PageRank. I also recall being kind of annoyed about it (and think I said so  during the meeting). For years, we&#8217;d been told that site owners shouldn&#8217;t have  to do extraordinary things to help search engines. Good page titles, good  ability to be crawled, sure. But having to think about things on a link-by-link  basis? That&#8217;s something I assumed Google was already up to snuff about. My  assumption had been that Google long decided to discount how much credit it  assigned to things like navigational links, when it could see the same links  appearing on multiple pages within the same web site.</p>
<p>Now to be clear, it&#8217;s not like Matt told everyone in the room to immediately  do PageRank sculpting. Many topics were discussed, and this was just one of many  things covered. But it was advice that came from Google &#8212; and it turned into a  genie that wouldn&#8217;t go into the bottle.</p>
<p>Soon after, Rand Fishkin at SEOmoz did an <a href="http://www.seomoz.org/blog/questions-answers-with-googles-spam-guru">article</a> about the topic, and more soon followed on the web. It was a topic at  conferences. It was a hot new fashion in SEO. And while plenty in the SEO space  will chase after the latest (and often useless) algorithm fad &#8212; this was a  chase sparked by Google itself. Why wouldn&#8217;t advanced people do it?</p>
<p><strong>PageRank Sculpting Gets Debated</strong></p>
<p>Not everyone agreed it was helpful. There&#8217;s been quite a bit of <a href="http://www.seomoz.org/blog/sculpting-with-nofollow-works-pretty-darn-well">debate</a> on whether it gives boost or not. <a href="../../youd-be-wise-to-nofollow-this-dubious-seo-advice-13524">Some</a> <a href="../../seo-vs-web-site-architecture-16628">have</a> argued against using it at all. And the search engines, when asked about it  since it gained popularity, have generally said that there are other things that  are better worth the effort. But neither had they ruled it out. As I summarized  <a href="../../no-advanced-seo-does-not-mean-spamming-14165">last  year</a>:</p>
<blockquote><p>I agree with the view that sculpting is a marginal activity compared to other  things that can be done. But if you’re an advanced SEO — even someone advanced  in terms of working with design issues — maybe it’s not so marginal. The search  engines themselves are saying it has some value. They’ve not said it’s a flat  out waste of time. And if you’ve mastered all the other things that are much  more important, then yes, something like this may very well be worth giving more  attention to.</p></blockquote>
<p>Or as Michael Gray <a href="http://www.wolf-howl.com/seo/nofollow-pagerank-sculpting-worth-effort/">explained</a>,  if you&#8217;re driving a beat-up old car of a web site, putting a PageRank sculpting  &#8220;engine&#8221; in it probably isn&#8217;t worthwhile. But if you&#8217;ve got a hot new sports  car, well&#8230;.</p>
<p><strong>PageRank Sculpting Gets Depreciated</strong></p>
<p>So today at <a href="http://searchmarketingexpo.com/advanced">SMX Advanced</a>, sculpting was being discussed, and then Matt Cutts  dropped a bomb shell that it no longer works to help flow more PageRank to the  unblocked pages. Again &#8212; and being really simplistic here &#8212; if you have $10 in  authority to spend on those ten links, and you block 5 of them, the other 5  aren&#8217;t going to get $2 each. They&#8217;re still getting $1. It&#8217;s just that the other  $5 you thought you were saving is now going to waste.</p>
<p>Further, it was explained that YouTube wasn&#8217;t doing sculpting way back in  2007 as a way to boost certain video content. Instead, it was that YouTube  randomly shows some video content and didn&#8217;t want these random selections to  perhaps gain more authority than they should. And even with the change announced  today, that still works. In the past, the unblocked videos got more authority  money and the blocked ones got none. Now, the unblocked videos still get  authority money &#8212; just not as much &#8212; and the blocked ones still get none.</p>
<p>But while that may be how it works on YouTube, I still recall PageRank  sculpting being positioned by Google as a way to also give some pages more link  juice. To <a href="http://groups.google.com/group/Google_Webmaster_Help-Indexing/browse_thread/thread/cf504ffd28b6bb68/21b12da30e8b0de2?q=nofollow">quote</a> Matt when asked about this in an official Google thread:</p>
<blockquote><p>What are some appropriate ways to use the nofollow tag? One good example is  the home page of expedia.com. If you visit that page, you&#8217;ll see that the &#8220;Sign  in&#8221; link is nofollow&#8217;ed. That&#8217;s a great use of the tag: Googlebot isn&#8217;t going to  know how to sign into expedia.com, <strong>so why waste that PageRank </strong>on a page  that wouldn&#8217;t benefit users or convert any new visitors? Likewise, the &#8220;My  itineraries&#8221; link on expedia.com is nofollow&#8217;ed as well. That&#8217;s another page  that wouldn&#8217;t really convert well or have any use except for signed in users, so  the nofollow on Expedia&#8217;s home page means that Google won&#8217;t crawl those specific  links.</p>
<p>Most webmasters don&#8217;t need to worry about sculpting the flow of PageRank on  their site, but if you want to try advanced things with nofollow to send less  PageRank to copyright pages, terms of service, privacy pages, etc., that&#8217;s your  call.</p></blockquote>
<p>I&#8217;ve bolded the key part. Matt stresses &#8212; as he&#8217;s consistently done since  talking about this at the SEO meeting &#8212; that this is something most people  didn&#8217;t need to worry about or do. But saying &#8220;why waste that PageRank&#8221; means that at  the time of giving this advice, PageRank was something that could be &#8220;saved&#8221; and  &#8220;spent&#8221; on other pages.</p>
<p>You can expect Matt will do a blog post to cover this topic more. You can  expect lots of people to be analyzing the change, and what it might or might not  mean. And you should really understand that it was never the case that links  shared equally in the amount of authority money a page had. In talking  with Matt during the &#8220;You &amp; A Session&#8221; at SMX Advanced, he confirmed that  Google itself makes many determinations of how exactly a page can spend that  authority money. IE &#8212; while a page might have $10 to spend, Google itself  largely acts as the page&#8217;s investment banker, not the page&#8217;s author.</p>
<p>I wouldn&#8217;t panic and immediately start removing nofollow attributes that have  been done for PageRank sculpting purposes. In general, I&#8217;d never recommend  changing anything to a site that seems to be performing well. Take the time to  let more discussion and information come from Google and other sources.</p>
<p><strong>JavaScript onClick &amp; Paid Link Worries</strong></p>
<p>Those who PageRank sculpted following Google&#8217;s advice may have spent time  doing something that no longer will work, or work as effectively, but they&#8217;ve  not necessarily wasted time. Maybe it was helping them some in the past (plenty  believe this). And they might not have to spend time removing it, any more than  there are plenty of sites that still have <a href="../../meta-robots-tag-101-blocking-spiders-cached-pages-more-10665">meta  keywords tags</a> in place even though widespread search engine support of this  was dropped long ago. That&#8217;s good depreciation, or effectively backwards  compatibility. No one needs to change anything because the sites still keep  &#8220;working&#8221; despite the past support being gone.</p>
<p>It&#8217;s a different case with Google&#8217;s new handling of JavaScript&#8217;s &#8220;onClick&#8221;  function. To fully understand it, read Vanessa Fox&#8217;s in-depth report from last  week, <a href="../../google-io-new-advances-in-the-searchability-of-javascript-and-flash-but-is-it-enough-19881">Google  I/O: New Advances In The Searchability of JavaScript and Flash, But Is It  Enough?</a>, which broke the news here.</p>
<p>Links in JavaScript that were invisible to Google before are now being read.  And some people have used JavaScript as a way to deliver paid links in a way  that don&#8217;t violate Google&#8217;s guidelines may not technically on the wrong side of  the Google law. It&#8217;s been a long accepted practice that this was a &#8220;safe&#8221; way to  deal with paid links, once that Google&#8217;s suggested itself.</p>
<p>It&#8217;s as if Google has suddenly passed a new safety helmet law for web sites,  mandating that the old helmets they&#8217;d been using are no longer good enough. Now  they need to do something different.</p>
<p>What about nofollow? After all, <a href="../../time-for-google-to-give-up-the-fight-against-paid-links-11021">Google&#8217;s  been pushing nofollow</a> as something sites should do as a safety device for  paid links long after paid links themselves had been in existence.</p>
<p>True, and there are plenty of sites out there that have never caught up with  this new Google guideline (and still sucky for those who really still innocently  don&#8217;t know better). But that&#8217;s different than sites that thought they were doing  the right thing and now which have to change again.</p>
<p>For the record, Matt said today that there&#8217;s no immediate penalties likely  to be given out. Honestly, I think the spam team is still having to digest how  to handle this change that&#8217;s been brought about by Google&#8217;s crawling team. And  he also said that the nofollow attribute can be applied to JavaScript links that  are not otherwise being redirected through a robots.txt block.</p>
<p>As I said in the case of PageRank sculpting, I wouldn&#8217;t immediately panic.  But unlike with PageRank sculpting, if you&#8217;re selling paid links and thought  JavaScript was protecting you, I would fairly quickly ensure that redirects are  blocked by using nofollow within the JavaScript itself or by going through a  robots.txt block.</p>
<p>(For an example of this, our paid links get delivered through JavaScript  generated by Google Ad Manager. The links all get redirected through this  domain &#8212; http://googleads.g.doubleclick.net &#8212; and you can see from the  robots.txt file <a href="http://googleads.g.doubleclick.net/robots.txt">there</a> that search  engines aren&#8217;t allowed to crawl it. So, the links pass no authority on to other  pages)</p>
<p><strong>Backwards Compatibility Is Important</strong></p>
<p>Overall, I want Google to keep advancing. But it needs to ensure that the  changes don&#8217;t dramatically cause more work for site owners, as a result. We need  a period of backwards compatibility in terms of Google indexing, just as much as  it&#8217;s helpful with computer operating systems.</p>
<p>For more about the discussions today out of SMX Advanced, also see these  selected stories from the live blogging <a href="../../smx-advanced-day-1-live-blogging-coverage-20386">round-up</a>:</p>
<ul>
<li><a href="http://seogadget.co.uk/duplicate-content-solutions-the-canonical-tag-smx-advanced-coverage-2009/">Duplicate  Content Solutions &amp; The Canonical Tag &#8211; SMX Advanced Coverage 2009</a>, SEO  Gadget</li>
<li><a href="http://outspokenmedia.com/seo/canonical-tag/">Duplicate Content  Solutions &amp; The Canonical Tag</a>, outspokenmedia.com</li>
<li><a href="http://outspokenmedia.com/internet-marketing-conferences/beyond-the-usual-link-building/">Beyond  the Usual Link Building</a>, outspokenmedia.com</li>
<li><a href="http://seogadget.co.uk/beyond-the-usual-linkbuilding-smx-advanced-2009/">Beyond  the usual linkbuilding &#8211; SMX Advanced 2009</a>, SEO Gadget</li>
<li><a href="http://blog.search-mojo.com/2009/06/02/live-from-smx-advanced-beyond-the-usual-link-building/">Live  from SMX Advanced: Beyond the Usual Link Building</a>, Search Marketing Sage</li>
<li><a href="http://www.bruceclay.com/blog/archives/2009/06/nofollow_makes.html">Nofollow  Makes News at SMX Advanced</a>, BruceClay.com</li>
<li><a href="http://outspokenmedia.com/internet-marketing-conferences/chat-with-matt-cutts/">You&amp;A  With Matt Cutts</a>, outspokenmedia.com</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-loses-backwards-compatibility-on-paid-link-blocking-pagerank-sculpting-20408/feed</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>SEOs Say AOL&#8217;s Love.com Feels More Like A One-Night Stand</title>
		<link>http://searchengineland.com/lovecom-feels-like-one-night-stand-18156</link>
		<comments>http://searchengineland.com/lovecom-feels-like-one-night-stand-18156#comments</comments>
		<pubDate>Mon, 27 Apr 2009 18:22:31 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[AOL: General]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[SEO: Spamming]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=18156</guid>
		<description><![CDATA[AOL recently soft-launched Love.com, which it advertises as a collection of &#8220;topic blogs&#8221; that &#8220;provide a central view into what the world is loving now.&#8221; 
But it&#8217;s really an SEO play. Give the site any subdomain under the sun, and Love.com will make a new site on-the-fly with content scraped from news sites, blogs, YouTube [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Flovecom-feels-like-one-night-stand-18156"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Flovecom-feels-like-one-night-stand-18156" height="61" width="51" /></a></div><p>AOL recently soft-launched <a href="http://www.love.com/">Love.com</a>, which it <a href="http://www.somewhatfrank.com/2009/04/love-dot-com.html">advertises</a> as a collection of &#8220;topic blogs&#8221; that &#8220;provide a central view into what the world is loving now.&#8221; </p>
<p>But it&#8217;s really an SEO play. Give the site any subdomain under the sun, and Love.com will make a new site on-the-fly with content scraped from news sites, blogs, YouTube videos, Twitter messages and more. Like this:</p>
<p><a href="http://www.flickr.com/photos/23148333@N06/3480042965/" title="Love.com: Cold Sores by Search Engine Land, on Flickr"><img src="http://farm4.static.flickr.com/3655/3480042965_a1afe9d35c.jpg" width="500" height="286" alt="Love.com: Cold Sores" /></a></p>
<p>Everybody loves cold sores, right? </p>
<p>Well, not everybody loves Love.com. Says <a href="http://www.davidnaylor.co.uk/aol-loves-subdomain-spam.html">Dave Naylor</a>:</p>
<blockquote><p>&#8220;Approximately 1 bazillion keyword specific subdomains filled with scraped content and ads? Are AOL so desperate that they&#8217;re resorting to five year old spamming tricks?&#8221;</p></blockquote>
<p>Aaron Wall <a href="http://www.seobook.com/does-google-love-com-spam">shifts some of the blame to Google</a> and Eric Schmidt&#8217;s recent statements about brands being the way to clean up the Internet &#8220;cesspool&#8221;:</p>
<blockquote><p>&#8220;Google&#8217;s original strategy with the authority-centric algorithm was a false belief that the emphasis on authority would make the web a deeper and richer experience. New content would need to be better than older established content to outrank it. But as media companies face sharp losses Google is quickly finding out that their authority emphasis is creating a shallower web, where most of the big networks have 2 primary roles: create garbage and recycle garbage.&#8221;</p></blockquote>
<p>Aaron also points out on Twitter that Love.com has more than 100,000 pages in Google&#8217;s index and is <a href="http://twitter.com/aaronwall/status/1628276193">offering</a> a free month of SEO training to whomever guesses how long it&#8217;ll stay above 100k. </p>
<p>For now, Love.com has 350,000 of these &#8220;topic blogs&#8221; and gets 100,000 unique visitors a week, <a href="http://www.techcrunch.com/2009/04/24/aols-secret-lovecom/">according to TechCrunch</a>. TC also says a full launch of Love.com is coming later this year.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/lovecom-feels-like-one-night-stand-18156/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Advice On Using The New Canonical Tag</title>
		<link>http://searchengineland.com/googles-advice-on-using-the-new-canonical-tag-16931</link>
		<comments>http://searchengineland.com/googles-advice-on-using-the-new-canonical-tag-16931#comments</comments>
		<pubDate>Fri, 13 Mar 2009 13:19:19 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Webmaster Central]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[SEO: Redirects & Moving Sites]]></category>
		<category><![CDATA[SEO: Submitting & Sitemaps]]></category>
		<category><![CDATA[SEO: Tagging]]></category>
		<category><![CDATA[SEO: Titles & Descriptions]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=16931</guid>
		<description><![CDATA[A month ago, Google, Yahoo and Microsoft announced they will be supporting a new canonical tag that allows you to tell search engines that page X is a duplicate page to page Z.  In a way, it is a 301 redirect, without the physical redirect.
The tag is incredibly powerful, as are 301 redirects and [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogles-advice-on-using-the-new-canonical-tag-16931"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogles-advice-on-using-the-new-canonical-tag-16931" height="61" width="51" /></a></div><p>A month ago, Google, Yahoo and Microsoft announced they will be supporting a new <a href="http://searchengineland.com/canonical-tag-16537">canonical tag</a> that allows you to tell search engines that page X is a duplicate page to page Z.  In a way, it is a 301 redirect, without the physical redirect.</p>
<p>The tag is incredibly powerful, as are 301 redirects and using this tag should be done with caution and slowly.  Matt Cutts posted a new video explaining how one should go about using this tag, being that it is so new.  Here is the video:</p>
<p><object width="560" height="340"><param name="movie" value="http://www.youtube.com/v/LnXponbEHjw&#038;hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/LnXponbEHjw&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="560" height="340"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/googles-advice-on-using-the-new-canonical-tag-16931/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google, Yahoo &amp; Microsoft Unite On &#8220;Canonical Tag&#8221; To Reduce Duplicate Content Clutter</title>
		<link>http://searchengineland.com/canonical-tag-16537</link>
		<comments>http://searchengineland.com/canonical-tag-16537#comments</comments>
		<pubDate>Thu, 12 Feb 2009 20:55:05 +0000</pubDate>
		<dc:creator>Vanessa Fox</dc:creator>
				<category><![CDATA[Features: General]]></category>
		<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[How To: SEO]]></category>
		<category><![CDATA[Microsoft: Bing]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[Top News]]></category>
		<category><![CDATA[Yahoo: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=16537</guid>
		<description><![CDATA[The web is full of duplicate content. Search engines try to index and display the original or &#8220;canonical&#8221; version. Searchers only want to see one version in results. And site owners worry that if search engines find multiple versions of a page, their link credit will be diluted and they&#8217;ll lose ranking.
Today, Google, Yahoo and [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fcanonical-tag-16537"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fcanonical-tag-16537" height="61" width="51" /></a></div><p>The web is full of duplicate content. Search engines try to index and display the original or &#8220;canonical&#8221; version. Searchers only want to see one version in results. And site owners worry that if search engines find multiple versions of a page, their link credit will be diluted and they&#8217;ll lose ranking.</p>
<p>Today, <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html">Google</a>, <a href="http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/">Yahoo</a> and <a href="http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx">Microsoft</a> (links are to their separate announcements) have united to offer a way to reduce duplicate content clutter and make things easier for everyone. Webmasters rejoice! Worried about duplicate content on your site? Want to know what &#8220;canonical&#8221; means? Read on for more details.</p>
<p><strong>Multiple URLs, one page</strong></p>
<p>Duplicate content comes in different forms, but a major scenario is multiple URLs that point to the same page. This can come up for lots of reasons. An ecommerce site might allow various sort orders for a page (by lowest price, highest rated&#8230;), the marketing department might want <a href="http://janeandrobot.com/post/URL-Referrer-Tracking.aspx">tracking codes added</a> to URLs for analytics. You could end up with 100 pages, but 10 URLs for each page. Suddenly search engines have to sort  through 1,000 URLs.</p>
<p>This can be a problem for a couple of reasons.</p>
<ul>
<li><strong>Less of the site may get crawled.</strong> Search engine crawlers use a limited amount of bandwidth on each site (based on numerous factors). If the crawler only is able to crawl 100 pages of your site in a single visit, you want it to be 100 unique pages, not 10 pages 10 times each.</li>
<li><strong>Each page may not get full link credit. </strong>If a page has 10 URLs that point to it, then other sites can link to it 10 different ways. One link to each URL dilutes the value  the page could have if all 10 links pointed to a single URL.</li>
</ul>
<p><strong>Using the new canonical tag</strong></p>
<p>Specify the canonical version using a tag in the head section of the page as follows:</p>
<pre>&lt;link rel="canonical" href="<a href="http://www.example.com/product.php?item=swedish-fish" target="_blank">http://www.example.com/product.php?item=swedish-fish</a>"/&gt;</pre>
<p>That&#8217;s it!</p>
<ul>
<li>You can only use the tag on pages within a single site (subdomains and subfolders are fine).</li>
<li>You can use relative or absolute links, but the search engines recommend absolute links.</li>
</ul>
<p>This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.</p>
<ul>
<li>Links to all URLs will be consolidated to the one specified as canonical.</li>
<li>Search engines will consider this URL a &#8220;strong hint&#8221; as to the one to crawl and index.</li>
</ul>
<p><strong>Canonical URL best practices</strong></p>
<p>The search engines use this as a hint, not as a directive, (Google calls it a &#8220;suggestion that we honor strongly&#8221;) but are more likely to use  it if the URLs use best practices, such as:</p>
<ul>
<li>The  content rendered for each URL is very similar or exact</li>
<li>The canonical URL is the shortest version</li>
<li>The URL uses easy to understand parameter patterns (such as using ? and %)</li>
</ul>
<p>Can this be abused by spammers? They might try, but Matt Cutts of Google told me that the same safeguards that prevent abuse by other methods (such as redirects) are in place here as well, and that Google  reserves the right to take action on sites that are using the tag to manipulate search engines and violate search engine guidelines.</p>
<p>For instance, this tag will only work with very similar or identical content, so you can&#8217;t use it to send all of the link value from the less important pages of your site to the more important ones.</p>
<p>If tags conflict (such as pages point to each other as canonical, the URL specified as canonical redirects to a non-canonical version, or the page specified as canonical doesn&#8217;t exist), search engines will sort things out just as they do now, and will determine which URL they think is the best canonical version.</p>
<p><strong>The tag in action</strong></p>
<p>This tag will most often be useful in the case of multiple URLs pointing at the same page, but might also be used when multiple versions of a page exist. For instance, wikia.com is using the tag for previous revisions of a page. Both http://watchmen.wikia.com/index.php?title=Comedian%27s_badge&amp;diff=4901&amp;oldid=4819 and http://watchmen.wikia.com/index.php?title=Comedian%27s_badge&amp;diff=5401&amp;oldid=4901reference the latest version of the article (http://watchmen.wikia.com/wiki/Comedian%27s_badge) as the canonical.</p>
<p>The search engines stress that it&#8217;s still important to <a href="http://googlewebmastercentral.blogspot.com/2007/09/google-duplicate-content-caused-by-url.html">build good URL structure</a> and also note that if you aren&#8217;t able to implement this tag, they&#8217;ll still keep the processes they have now to determine the canonical. For instance, at <a href="http://smxwest.com/">SMX West</a> on Tuesday, Maile Ohye of Google explained how Google can detect patterns in URLs if they use standard parameters. For instance, with these URLs:</p>
<ul>
<li>http://www.example.com/buffy?cat=spike</li>
<li>http://www.example.com/buffy?cat=spike&amp;sort=evil</li>
<li>http://www.example.com/buffy?cat=spike&amp;sort=good</li>
</ul>
<p>Maile explained that Google can detect (particularly when looking at patterns across the site) that the <em>sort</em> parameter may order the page differently, but that the URLs with the sort parameter display the same  content as the shorter URL (http://www.example.com/buffy?cat=spike).</p>
<p>While it&#8217;s rare for the search engines to join forces, this isn&#8217;t the first time they&#8217;ve come together on a standard. In November 2006, they came together to support <a href="http://sitemaps.org/">sitemaps.org</a>. And in June 2008 they announced a <a href="http://searchengineland.com/yahoo-google-microsoft-clarify-robotstxt-support-14125">standard set of robots.txt directives</a>. Matt Cutts of Google and Nathan Buggia of Microsoft told me that they want to help reduce the clutter on the web, and make things easier for searchers as well as site owners.</p>
<p>This new tag won&#8217;t completely solve duplicate issues on the web, but it should help make things quite a bit easier particuarly for ecommerce sites, who likely need all the help they can get in the current economic conditions. Site owners have been asking for help with these issues for a really long time so this should be a greatly welcomed addition.</p>
<p><strong>Postscript by Barry Schwartz</strong>:</p>
<p>The search engines will be talking about this news at the Ask the Search Engines panel at SMX West.  We will be blogging this panel live at the <a href="http://www.seroundtable.com/archives/019359.html">Search Engine Roundtable</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/canonical-tag-16537/feed</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Google Offers Duplicate Content Tips Roundup</title>
		<link>http://searchengineland.com/google-offers-duplicate-content-tips-roundup-14168</link>
		<comments>http://searchengineland.com/google-offers-duplicate-content-tips-roundup-14168#comments</comments>
		<pubDate>Mon, 09 Jun 2008 18:59:56 +0000</pubDate>
		<dc:creator>Danny Sullivan</dc:creator>
				<category><![CDATA[SEO: Duplicate Content]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/google-offers-duplicate-content-tips-roundup-14168.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-offers-duplicate-content-tips-roundup-14168"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-offers-duplicate-content-tips-roundup-14168" height="61" width="51" /></a></div><p>Struggling with duplicate content woes? Google&#8217;s done a
<a href="http://googlewebmastercentral.blogspot.com/2008/06/duplicate-content-due-to-scrapers.html">
blog post today</a> to try and provide further advice and guidance. Among the
highlights:</p>
<p><span id="more-14168"></span></p>
<ul>
<li>Including a preferred URL in sitemaps submitted to Google can increase the
chances that the URL you want is used, in cases where content can be found
within your own site in different ways.<br />
&nbsp;</li>
<li>Consider using 301 redirects, blocking, consistent linking, and other tips
covered
<a href="http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html">
in this</a> and
<a href="http://googlewebmastercentral.blogspot.com/2007/06/duplicate-content-summit-at-smx.html">
this</a>, which are official Google blog posts.<br />
&nbsp;</li>
<li>To deal with offsite content that ranks above you, ensure that your site
itself is being crawled and not-penalized by Google.<br />
&nbsp;</li>
<li>Consider tips like providing slightly alternate versions of your content
for syndication; use absolute links back to your own content or ask partners
to block your content from being crawled. These come from a post that
post-Google Vanessa Fox did recently on her personal blog:
<a href="http://www.vanessafoxnude.com/2008/05/14/ranking-as-the-original-source-for-content-you-syndicate/">
Ranking As The Original Source For Content You Syndicate. </a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-offers-duplicate-content-tips-roundup-14168/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search Illustrated: How A Search Engine Determines Duplicate Content</title>
		<link>http://searchengineland.com/search-illustrated-how-a-search-engine-determines-duplicate-content-13980</link>
		<comments>http://searchengineland.com/search-illustrated-how-a-search-engine-determines-duplicate-content-13980#comments</comments>
		<pubDate>Tue, 13 May 2008 12:00:33 +0000</pubDate>
		<dc:creator>Elliance</dc:creator>
				<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[Search Illustrated]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/search-illustrated-how-a-search-engine-determines-duplicate-content-13980.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fsearch-illustrated-how-a-search-engine-determines-duplicate-content-13980"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fsearch-illustrated-how-a-search-engine-determines-duplicate-content-13980" height="61" width="51" /></a></div><p><a href="http://searchengineland.com/guides/search_illustrated.php">
</a> With so much good content on the web, it&#8217;s inevitable that the same information will be displayed many times over.  Whether it&#8217;s a blog post pointing to good statistics, an RSS feed pulled into a complementary site, or blatantly copied material, duplicate content can be an issue.</p>
<p>This week&#8217;s infographic shows how search engines make distinctions between original and duplicate content:</p>
<p><span id="more-13980"></span>
<img alt="How A Search Engine Determines Duplicate Content" src="http://searchengineland.com/images/se-duplicate-content.gif" width="500" height="650" /></p>
<p><i>Graphic by <a href="http://seo.elliance.com/">Elliance</a>, an eMarketing firm specializing in results-driven search engine marketing, web site design, and outbound eMarketing campaigns. The firm is the creator of the <a href="http://ennect.com">ennect</a> online marketing toolkit. The <a href="http://searchengineland.com/lands/search-illustrated.php">Search Illustrated</a> column appears Tuesdays at <a href="http://searchengineland.com">Search Engine Land</a>.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/search-illustrated-how-a-search-engine-determines-duplicate-content-13980/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Got Duplicate Content?  Don’t Let It Dilute Your SEO Efforts</title>
		<link>http://searchengineland.com/got-duplicate-content-don%e2%80%99t-let-it-dilute-your-seo-efforts-13555</link>
		<comments>http://searchengineland.com/got-duplicate-content-don%e2%80%99t-let-it-dilute-your-seo-efforts-13555#comments</comments>
		<pubDate>Wed, 12 Mar 2008 11:59:29 +0000</pubDate>
		<dc:creator>Dave Feldman</dc:creator>
				<category><![CDATA[Brand Aid]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/got-duplicate-content-don%e2%80%99t-let-it-dilute-your-seo-efforts-13555.php</guid>
		<description><![CDATA[  &#8220;Water, water everywhere nor any drop to drink,&#8221; laments a sailor adrift in Samuel Taylor Coleridge’s The Rime of the Ancient Mariner.   The poor guy needs water to say alive&#8212;and he’s surrounded by it&#8212; yet if he drinks it, the salt content will kill him.
As a search marketer, perhaps you can [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgot-duplicate-content-don%25e2%2580%2599t-let-it-dilute-your-seo-efforts-13555"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgot-duplicate-content-don%25e2%2580%2599t-let-it-dilute-your-seo-efforts-13555" height="61" width="51" /></a></div><p><a href="http://searchengineland.com/lands/brand-aid.php"> </a> &#8220;Water, water everywhere nor any drop to drink,&#8221; laments a sailor adrift in Samuel Taylor Coleridge’s The Rime of the Ancient Mariner.   The poor guy needs water to say alive&mdash;and he’s surrounded by it&mdash; yet if he drinks it, the salt content will kill him.</p>
<p>As a search marketer, perhaps you can relate.</p>
<p>Much like our sailor friend, you’re surrounded by what you need&mdash;namely, content&mdash;which, of course, is great for your search presence.  Yet too much of the same content can be&#8230; well, hazardous.</p>
<p><span id="more-13555"></span>
<b>A little salt can’t hurt, right? </b></p>
<p>Thumbing through a retail catalog, you can often find the same merchandise represented in multiple categories.  For instance, a particular pair of hiking boots might be found in the Men’s Clothing section, and in the Footwear portion, and maybe even in the Outdoors offering as well.</p>
<p>Not surprisingly, the same holds true for a website.  After all, users don’t mind where they find the product on a site, as long as they don’t have to look too hard to find it.  However, while a multiple-category strategy may work for the glossy paper ilk, the same approach could prove problematic for a search marketer.  Specifically, wouldn’t it be considered duplicate content and carry a penalty?</p>
<p>Yes and no.</p>
<p>There’s really nothing wrong with having multiple versions of a page co-exist on your website.  Sure, there’s some redundancy, but it won’t break the Internet.  However, the problem arises when the same page lives in different sections on your site AND it has different URLs.  Then it’s considered duplicate content by the search engines.</p>
<p>The result?  It impairs a search engine’s ability to properly determine the relevance of your site to the product.  And if an engine can’t serve the best results, their audience will&#8230; um, jump ship.  In addition, attempts to game the system by duplicating your pages will get your site flagged and your visibility will suffer.   Instead, avoid duplicate content issues and any potential penalties by utilizing a robots.txt or no index tag to keep multiple versions of your pages from being indexed.</p>
<p><b>The problem isn&#8217;t penalties&#038;mdash&#8217;it&#8217;s link dilution</b></p>
<p>With that said, however, keep in mind that the duplicate content issue is far less about a ranking penalty than it is about link dilution.</p>
<p>Let’s say you have 30 external sites lined up to provide a link to your site; you just need to tell them what page to link to.  If all 30 of those links point to the hiking boots page in the Men’s Clothing section, a good deal of link value will be passed to that page.</p>
<p>However, what happens if those 30 links get divvied-up across three different versions of that page, and are split between Men’s Clothing, Footwear, and Outdoors?  Potentially, each page would only get 33 percent of that total link value.  Clearly, 100 percent would be better.</p>
<p>Determining which page is the authority, and then directing all of your inbound links there, provides more benefit than trying to divide and conquer.  Doing so could mean the difference between being found at the top of the search results and receiving a sales windfall, or being sunk to the depths of the results and getting nothing at all.</p>
<p><b>Who’s the captain of this ship?</b></p>
<p>But how do you decide which page is the authority?  Staying with our example, should it be the one in the Men’s Clothing section?  Or the one in Footwear, or the one in Outdoors?</p>
<p>To decide, first assess the competitiveness of the term you’re targeting.  If it’s not very competitive and you believe you can get two similar&mdash;but not identical&mdash;versions of a page to rank for that term, you can try to optimize both pages equally.  I’m not advocating that you attempt to get as many versions of a page into the search results as possible, but there are occasions where it might make sense to go after two versions.</p>
<p>For instance, let’s say you have two pages about hiking boots.  One is the general product page, and the other is a promotional page for a limited time offer on the boots.  Obviously, you want to optimize the promo page so anyone looking for special offers on hiking boots would find the page with the limited time offer.  However, the general product page about hiking boots should also be optimized for people looking for hiking boots outside of any promotional periods.</p>
<p>In most cases though, it’s tough enough to get one page ranking for a term.  The best course of action is to decide which version provides the greatest opportunity for revenue over the long term.  Then optimize that page, direct all inbound links to it, and block any other versions from being indexed. Remember, there’s no reason to head into a storm if you can sail around it.</p>
<p>While at times you may feel stranded amidst a sea of content that you just can’t seem to make work, don’t resort to tactics that could blow your efforts off course.  Avoid creating multiple versions of the same page just to score more listings in the search results.  Instead, differentiate your content by tailoring messages to speak to a particular deal or offer, and concentrate your linking efforts on sending value to the page with the power to produce revenue for the long run.</p>
<p><i>Dave Feldman is director of client services for search engine marketing firm <a href="www.iprospect.com">iProspect</a> and can be reached at <a href="mailto:dave.feldman@iprospect.com">dave.feldman@iprospect.com</a>. The <a href="http://searchengineland.com/lands/brand-aid.php">Brand Aid</a> column appears Wednesdays at <a href="http://searchengineland.com">Search Engine Land</a>.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/got-duplicate-content-don%e2%80%99t-let-it-dilute-your-seo-efforts-13555/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Yahoo Site Explorer Adds Dynamic URL Rewriting Tool</title>
		<link>http://searchengineland.com/yahoo-site-explorer-adds-dynamic-url-rewriting-tool-11991</link>
		<comments>http://searchengineland.com/yahoo-site-explorer-adds-dynamic-url-rewriting-tool-11991#comments</comments>
		<pubDate>Tue, 21 Aug 2007 15:23:56 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[SEO: Domain Names & URLs]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>
		<category><![CDATA[SEO: Redirects & Moving Sites]]></category>
		<category><![CDATA[Yahoo: Site Explorer]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/yahoo-site-explorer-adds-dynamic-url-rewriting-tool-11991.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fyahoo-site-explorer-adds-dynamic-url-rewriting-tool-11991"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fyahoo-site-explorer-adds-dynamic-url-rewriting-tool-11991" height="61" width="51" /></a></div><p>The Yahoo Search Blog <a href="http://www.ysearchblog.com/archives/000479.html">announced</a> a new Site Explorer feature that enables webmasters to inform Yahoo about dynamic URLs that might be creating duplicate content issues or crawl barriers.</p>
<p>The new feature is named &#8220;Dynamic URL Rewriting,&#8221; and it allows webmasters to view potential duplicate content issues and have Yahoo rewrite those URLs at the index level.  Yahoo allows you to define which URL is the primary, and the remaining URLs will be dynamically rewritten to that primary URL.</p>
<p><span id="more-11991"></span>
One of the most discussed advanced topics in SEO is duplicate content.  Duplicate content can often be generated by accident by having automated dynamic URLs.  How so?  The URLs can append characters to the URLs such as session ids, tracking parameters, format modifiers and other parameters that do not change the content of the page but yet create a new unique looking URL.  Savvy SEOs detect these duplicate URLs and use rewriting technology to inform the search engine of the primary URL.  But sometimes webmasters forget.</p>
<p>This new tool will enable webmasters to easily identify these URLs and classify the primary versions of the URLs.</p>
<p>For example, say you set up a tracking URL for a banner ad such as searchengineland.com/?ad_id=1234, which is basically a duplicate URL for searchengineland.com.  You can tell Yahoo that anything that conforms to the &#8220;ad_id&#8221; parameter following the ? will by a duplicate URL to the primary and Yahoo will rewrite the URL to default to the primary.</p>
<p>Here is a screen capture of the tool:</p>
<p><a href="http://www.flickr.com/photos/rustybrick/1188618922/" title="Photo Sharing"><img src="http://farm2.static.flickr.com/1344/1188618922_4c1df32c01.jpg" width="500" height="239" alt="Yahoo Site Explorer Dynamic URL Feature" /></a></p>
<p>As you can see, Yahoo gives you two options with the tool</p>
<p>(1) Remove these parameters from the URLs, such as in case of session ids, you could ask to remove &#8217;sid&#8217; from URLs
(2) Use a default value for the parameter, for example you could set the &#8217;src&#8217; parameter to be &#8216;yhoo_srch&#8217;</p>
<p>It currently seems you can apply up to three parameters for the rewriting or removal in this tool.</p>
<p>Yahoo has done a great job explaining how it works at the <a href="http://www.ysearchblog.com/archives/000479.html">Yahoo Search Blog</a> and over at <a href="http://help.yahoo.com/l/us/yahoo/search/siteexplorer/dynamic/dynamic-01.html">Yahoo Site Explorer help</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/yahoo-site-explorer-adds-dynamic-url-rewriting-tool-11991/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Understanding Search Engines Duplicate Content Issues</title>
		<link>http://searchengineland.com/understanding-search-engines-duplicate-content-issues-11738</link>
		<comments>http://searchengineland.com/understanding-search-engines-duplicate-content-issues-11738#comments</comments>
		<pubDate>Thu, 19 Jul 2007 16:06:05 +0000</pubDate>
		<dc:creator>Shari Thurow</dc:creator>
				<category><![CDATA[100% Organic]]></category>
		<category><![CDATA[SEO: Duplicate Content]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/understanding-search-engines-duplicate-content-issues-11738.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Funderstanding-search-engines-duplicate-content-issues-11738"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Funderstanding-search-engines-duplicate-content-issues-11738" height="61" width="51" /></a></div><p><a href="http://searchengineland.com/lands/100-organic.php">
<img border="0" src="http://searchengineland.com/images/organic100.jpg" alt="100% Organic - A Column From Search Engine Land" align="left" hspace="5" vspace="3" width="100" height="100"></a> I admit it. I am a search engine geek. Because I have a passion for understanding search usability, one of my particular interests is duplicate content filtering. If you want to really irritate searchers, present the same content to them in all or most of the top 10 positions in search results.</p>
<p>In the past, before  search engines became effective at name results clustering, many search engine optimization (SEO) professionals, including myself, considered it quite the accomplishment to help client sites appear in the majority of the top 30 search results. I remember when one of my client sites held 24 of the top 30 positions. He thought I was the greatest invention since the light bulb. However, having analyzed the search data and Web analytics data, I realized that having all of those top positions did not necessarily mean top conversions. So I was happy to see how search engines are becoming increasingly effective at filtering out duplicate content.</p>
<p><span id="more-11738"></span>
At the <a href="http://searchmarketingexpo.com/smx_advanced07/">SMX Advanced conference</a> in June 2007, there were a few takeaways that I thought were very important for SEO professionals to keep in mind: multiple duplicate content filters, and knowing when to apply 301 redirects.</p>
<p><strong>Multiple Filters </strong></p>
<p>One common misconception about duplicate content filtering is that there is only one main duplicate filter. In fact, there are multiple duplicate filters, and they are applied throughout the three main parts of the search engine process:</p>
<ul>
<li>Spidering or crawling</li>
<li>Indexing</li>
<li>Query processing</li>
</ul>
<p>Some duplicate content filters weed out content before Web pages are added to the index, meaning that some duplicate content will not be displayed in search results. A Web page cannot rank until it is in a search engine index; therefore, crawl-time filters can actually exclude URLs from being added to the search engine index.</p>
<p>Some duplicate content filters are applied after pages are added to the search engine index. Web pages are available to rank, but they might not display in search engine results pages (SERPs) as Web site owners might like them to appear. For example, no one wants their content to appear in the dreaded Supplemental Index.</p>
<p>Another common misconception is that if a listing appears in Google&#8217;s Supplemental Index, the site has been penalized. Duplicate content does not cause a site to be placed in the Supplemental Index. From Vanessa Fox&#8217;s blog:</p>
<blockquote><p>If you have  pages that are duplicates or very similar, then your backlinks are likely distributed among those pages, so your PageRank may be more diluted  than if you had one consolidated page that all the backlinks pointed to. And  lower PageRank may cause pages to be supplemental.
</p></blockquote>
<p>And from Matt Cutts&#8217; blog:</p>
<blockquote><p>Having urls in the supplemental results doesn&rsquo;t mean that you have some sort of  penalty at all; the main determinant of whether a url is in our main web index  or in the supplemental index is PageRank. If you used to have pages in our main  web index and now they&rsquo;re in the supplemental results, a good hypothesis is that  we might not be counting links to your pages with the same weight as we have in  the past. The approach I&rsquo;d recommend in that case is to use solid white-hat SEO  to get high-quality links (e.g. editorially given by other sites on the basis of  merit).
</p></blockquote>
<p><strong>301 redirects vs. robots exclusion</strong></p>
<p>Remember when meta-tag content used to be the &quot;secret weapon&quot; to getting top rankings in Infoseek? Lately, I feel that search engine optimization professionals feel that 301 redirects are the secret weapon to getting and preserving link development, especially when redundant/duplicate content is involved.</p>
<p>For those of you who do not know what a 301 redirect is, I like to use this analogy. Have any of you ever moved and had to fill out those change of address cards at the post office? Basically, when you fill out these change of address cards, you are telling the U.S. postal service that your address has moved permanently to a new address.&nbsp; I like to think of a 301 is a change of address card for computers. The status code is telling search engines that the content at a specific URL (Web address) has permanently moved to another URL.</p>
<p>There are times when using 301 redirects are appropriate and times when it is not appropriate. For example, let&#8217;s use a home page. The following home page URLs typically lead to the same content:</p>
<ul>
<li>www.companyname.com</li>
<li>companyname.com/index.htm</li>
<li>www.companyname.com/default.cfm</li>
</ul>
<p>In this situation, it is best to implement a 301 redirect so that the most appropriate URL will lead the home page content. Search engines utilize canonicalization, which is the process of selecting the most appropriate URL when there are several choices. Be pro-active. Don&#8217;t let the search engines determine the most appropriate URL to crawl and to display in search results. As the Web site owner, you should select the URL that is best for your business and target audience.</p>
<p>Implementing 301 redirects is not the solution for every instance of duplicate content, in spite of what many SEO professionals might claim. The robots exclusion protocol is often far more appropriate.</p>
<p>Here is an example. Suppose a Web site owner has purchased and implemented a new content management system (CMS), and, as a result, the URL structure changed. During the site redesign, the Web site owner has eliminated content that has not converted well or is outdated.&nbsp; Should the Web site owner implement 301 redirects for the eliminated content?</p>
<p>Many SEO professionals often state that 301 redirects should be implemented to preserve the &quot;link juice&quot; to the expired content. In this situation, if a searcher clicks on a link to the expired content, he/she will typically be redirected to the home page. How does this benefit the search experience? The searcher expects to be delivered to specific content. Instead, he/she is redirected to a home page to begin searching for the desired content. It is a futile process, as the content has been removed. The result is a negative search experience and a negative user experience.</p>
<p>If content is removed, then delivering a custom 404 page is more appropriate, in spite of the &quot;link juice&quot; theory.</p>
<p><strong>Conclusion</strong></p>
<p>Search usability is not a term that is only applicable to Web search engines. Search usability does not only address  querying behavior. It also addresses other search behaviors (browsing, scanning, etc.) Duplicate content delivery often has a negative impact on a site&#8217;s overall search usability, before site visitors arrive at your site and after they arrive. By understanding how the commercial Web search engines filter out and display duplicate content, Web site owners can obtain greater search engine visiblity and a better user experience.</p>
<p><i>Shari Thurow is SEO Director for Omni Marketing Interactive. The <a href="http://searchengineland.com/lands/100-organic.php">100% Organic</a> column appears Thursdays at <a href="http://searchengineland.com">Search Engine Land</a>.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/understanding-search-engines-duplicate-content-issues-11738/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
