• http://searchengineland.com Danny Sullivan

    Over at SEOmoz, Rand recently wrote The Illustrated Guide to Duplicate Content in the Search Engines, which touches on some of the things here, as well. He covers things he’s heard from various search engines and tries to illustrate how in some cases, the original document should do fine.

  • http://www.VenturesWithoutCapital.com Bruce Judson

    Jill:

    Thanks for this very, very useful article. As you note, this has been a source of great confusion, particularly since so many services (whether correctly or incorrectly) advocate distributing your content among multiple Websites.

    I know your column focuses on organic search, but I just wanted to note that it is also getting increasingly hard to send “banished content” to Google for cost competitive PPC activity. On the PPC side, Google’s “Quality Score” is now designed to ensure that users clicking on advertising results also have a quality experience.

    Your readers may be interested in an article I wrote on the best references available for guiding advertisers to ensure their content yields high “Quality Scores” and low PPC costs. It’s available at Ventures Without Capital, and is titled The Secret to Top Google Quality Scores: Comparing Google Slap Reports”.

    Thanks again for your realy valuable insights

  • http://www.paulzhao.com Paul Zhao

    How do you feel about sites that have unique content on all pages, but have some duplicated content?

    Example: A site with a small paragraph of their “legal terms” on all its pages.

    Thanks,
    Paul Zhao

  • hounddog

    Jill,

    (1) I agree that the terminology is problematic. Just as “penalty” has some wrong connotations, “filter” also seems inadequate to describe some of what we see. For example, “original” pages that are not eliminated, but are ranked below lower quality duplicates. Or original pages that show up only in the “omitted” results.

    It does seem that there’s something besides “eliminating duplicate content from the result” set going on. Maybe not a penalty, and maybe just the SEs doing a less-than-perfect job at filtering, but in any case, “filter” doesn’t seem to fully describe the observed results.

    (2) Saying that “If the site your article is hosted on shows up instead of yours, so be it. There’s nothing wrong with that, as your site can be easily clicked to from your bio; the pros far outweigh the cons.” may be accurate for some scenarios. But if the page in question is being monetized by adsense (or other advertising/affiliate links), you surely must account for the risk that the other instances of your article will outrank yours or cause yours to be filtered, and may siphon off some of this advertising revenue. I think it’s too broad a generalization to conclude that the pros outweight the cons in all cases.

  • http://www.highrankings.com/ Jill

    Example: A site with a small paragraph of their “legal terms” on all its pages.

    It shouldn’t be a problem. If you’re concerned or worried, simply make the legal disclaimer into an image instead of real HTML text.

  • http://www.seo4fun.com/blog/2007/01/05/seo-myth-there-is-no-duplicate-content-penalty.html Halfdeck

    Completely disagree.

    http://www.seo4fun.com/blog/2007/01/05/seo-myth-there-is-no-duplicate-content-penalty.html

    “Duplicate content penalty is a myth” is the real myth.

    Adam lasnik says:

    “As I noted in the original post, penalties in the context of duplicate content are rare.”

    The keyword there is rare.

  • Steve Amundsen

    Good Comments, Jill. You are very correct that duplicate content is not really a penalty. What is a de facto “penalty” is poor or little original content. Besides, does anyone really believe that Google cannot discern which site has the original content versus duplicate or copied content? Original content is king. Long live the king.

  • http://www.demib.dk Mikkel deMib Svendsen

    To a site owner it dosn’t really matter if you call it a “penalty” or a “filter” – getting pages removed or not ranked well at all feels the same. It hurts.

    And the fact is, if you have a website with real duplicate or identical content it won’t perform as well as a “clean” website with a “one dimensional” architecture.

    The main problem with duplicate content filtering is that you leave it up to the engines to decide what to keep and what to filter out. NEVER leave it the engines to figure out your site! In my experience they NEVER make the choices you would have.

    Also, by having duplicate content you risk wasting your links. If each unique page you have can be found on several URL’s on your site some links may go to versions of the page that are being filtered out. Not good!

    The bottom line is that there is absolutely NO reason not to create a good site architecture and avoid duplicate content. Not just for the sake of engines but for the users as well.

  • http://www.globalwarming-awareness2007online.com NunoH

    A myth?
    Remember this?

    Quality guidelines – specific guidelines

    * Avoid hidden text or hidden links.
    * Don’t employ cloaking or sneaky redirects.
    * Don’t send automated queries to Google.
    * Don’t load pages with irrelevant words.
    * Don’t create multiple pages, subdomains, or domains with substantially duplicate content.

    Don’t pretend to be a bigger expert than the ones who actually make the technology.

    http://www.google.com/support/webmasters/bin/answer.py?answer=35769

  • http://www.ppcdiscussions.com jeremy mayes

    “It’s time to bite the bullet and use them as PPC landing pages instead.”

    Was that supposed to be a joke?

  • Jill

    [quote]The bottom line is that there is absolutely NO reason not to create a good site architecture and avoid duplicate content. Not just for the sake of engines but for the users as well.[/quote]

    Mikel, absolutely, positively agree!

  • Fridaynite

    Think about the 50 or 60 Google patents on duplicate content. Maybe there are no penalties, but i am sure that there are filters which send your duplicate sites on place 950.

  • http://www.neoranking.com neoranking

    Has anyone considered the impact of having good quality content articles that are duplicated from another source on a single domain which has strong links associated with it, would such a site be penalized? I think if one focus their effort on adding value to his/her visitors instead of just pleasing the Algorithm alone, any site will do just fine in the long run.

    In other words, if your website contains a small collection of content duplicated from another source but yet adds value to the visitors to your website, there is no need to worry about any penalty.

    From a personal experience, my own content was copied on another website though not completely but almost 80% of the content was taken but the other site was doing much better in terms of traffic while the original site was penalized. I know because I created the scenario and how the 2nd site filled with duplicated content was able to rank better is purely because it managed to attract some good links to it.

    There is really no end to this, so just focus on serving your site visitors better.

  • http://sample-as-that.blogspot.com ciaran

    Sorry Jill – I just don’t agree (well, not entirely)

    I know from your email newsletters (which I usually enjoy) that you enjoy a good myth debunk, but this is a myth which turns out to be true.

    Search engines DO penalise (not just filter) for duplicate content – such as when domains aren’t handled correctly (not necessarily as a spamming technique, but simply due to human error), and you end up with multiple versions of the same content. I’m saying this because of 1st hand experience

    And saying that its not big thing if someone with who reposts your article ranks higher than you, seems a very lax attitude to take.

    Sorry – but I think you got this one wrong.

  • http://www.wordtravels.com tomp

    I run a website which has just been penalised (or filtered) by Google, and Google traffic has gone down by 90% – we’re losing around 30,000 page views per day as a result. Our pages are still in Google but way down the listings, and the top results page include a few really bad computer-geneated sites. We are a small publisher and write all our own good-quality content. The content is well-regarded and is not deliberately optimised for search engines or packed with unnenecessary keywords. Lots of good sites (including Google) link to our site and most pages have a high page rank.

    I think the problem may be that we syndicate the content to other sites, who use it to add value to their sites, some take xml and integrate themselves and around 20 others get a white label version of our site. We also have a certain amount of duplicate content on our own site – snippets of reviews link to a full version on another page – seems reasonable to me, but this was a recent change as some pages were getting too long. And standard disclaimers etc on every page.

    There are still a few searches which we are still No 1, or first page, but only a fraction of what was the case.

    Do you have any advice for me and specifically for sites which syndicate content?

  • http://www.semreportcard.com/yahoos-trademark-policy-recklessly-abuses-brands-online/ semreportcard

    Let me throw in my two cents for a related issue–duplicate, or “mirror” sites.

    The search engines do penalize duplicate sites (content) through the use of filters–Yahoo much more so than Google.

    The majority of my clients are network marketing brands, aka MLM, aka direct selling companies.

    What many of these companies have done is use a software to generate replicated (duplicate) websites for new reps on either sub-domains or sub-folders. Unfortunately, too many of them do this on their corporate domain and the results are as follows:

    Google: Google dramatically stifles the rankings of these pages, even the originally published site, but still indexes them and displays them in the results. The effects of the filters are overcome through the acquisition of many quality links. As a result, the index page of the corporate domain tends to prevail as the “official home page.” The threshold point where the filters kick in on Google tends to be a couple hundred duplicate pages.

    Yahoo: Yahoo can be very unforgiving to duplicate content pages. I have seen some clients’ web pages (including the home page) rank outside of the top 1,000 results for publishing a hundred replicated (duplicate) sites. The only exception here seems to be when you type the name of the company with their domain extension, for example, “Company.com.” This search in Yahoo tends to rank the corporate site #1 across the board.