Understanding Search Engines Duplicate Content Issues

I admit it. I am a search engine geek. Because I have a passion for understanding search usability, one of my particular interests is duplicate content filtering. If you want to really irritate searchers, present the same content to them in all or most of the top 10 positions in search results.

In the past, before search engines became effective at name results clustering, many search engine optimization (SEO) professionals, including myself, considered it quite the accomplishment to help client sites appear in the majority of the top 30 search results. I remember when one of my client sites held 24 of the top 30 positions. He thought I was the greatest invention since the light bulb. However, having analyzed the search data and Web analytics data, I realized that having all of those top positions did not necessarily mean top conversions. So I was happy to see how search engines are becoming increasingly effective at filtering out duplicate content.

At the SMX Advanced conference in June 2007, there were a few takeaways that I thought were very important for SEO professionals to keep in mind: multiple duplicate content filters, and knowing when to apply 301 redirects.

Multiple Filters

One common misconception about duplicate content filtering is that there is only one main duplicate filter. In fact, there are multiple duplicate filters, and they are applied throughout the three main parts of the search engine process:

  • Spidering or crawling
  • Indexing
  • Query processing

Some duplicate content filters weed out content before Web pages are added to the index, meaning that some duplicate content will not be displayed in search results. A Web page cannot rank until it is in a search engine index; therefore, crawl-time filters can actually exclude URLs from being added to the search engine index.

Some duplicate content filters are applied after pages are added to the search engine index. Web pages are available to rank, but they might not display in search engine results pages (SERPs) as Web site owners might like them to appear. For example, no one wants their content to appear in the dreaded Supplemental Index.

Another common misconception is that if a listing appears in Google’s Supplemental Index, the site has been penalized. Duplicate content does not cause a site to be placed in the Supplemental Index. From Vanessa Fox’s blog:

If you have pages that are duplicates or very similar, then your backlinks are likely distributed among those pages, so your PageRank may be more diluted than if you had one consolidated page that all the backlinks pointed to. And lower PageRank may cause pages to be supplemental.

And from Matt Cutts’ blog:

Having urls in the supplemental results doesn’t mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they’re in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past. The approach I’d recommend in that case is to use solid white-hat SEO to get high-quality links (e.g. editorially given by other sites on the basis of merit).

301 redirects vs. robots exclusion

Remember when meta-tag content used to be the “secret weapon” to getting top rankings in Infoseek? Lately, I feel that search engine optimization professionals feel that 301 redirects are the secret weapon to getting and preserving link development, especially when redundant/duplicate content is involved.

For those of you who do not know what a 301 redirect is, I like to use this analogy. Have any of you ever moved and had to fill out those change of address cards at the post office? Basically, when you fill out these change of address cards, you are telling the U.S. postal service that your address has moved permanently to a new address.  I like to think of a 301 is a change of address card for computers. The status code is telling search engines that the content at a specific URL (Web address) has permanently moved to another URL.

There are times when using 301 redirects are appropriate and times when it is not appropriate. For example, let’s use a home page. The following home page URLs typically lead to the same content:

  • www.companyname.com
  • companyname.com/index.htm
  • www.companyname.com/default.cfm

In this situation, it is best to implement a 301 redirect so that the most appropriate URL will lead the home page content. Search engines utilize canonicalization, which is the process of selecting the most appropriate URL when there are several choices. Be pro-active. Don’t let the search engines determine the most appropriate URL to crawl and to display in search results. As the Web site owner, you should select the URL that is best for your business and target audience.

Implementing 301 redirects is not the solution for every instance of duplicate content, in spite of what many SEO professionals might claim. The robots exclusion protocol is often far more appropriate.

Here is an example. Suppose a Web site owner has purchased and implemented a new content management system (CMS), and, as a result, the URL structure changed. During the site redesign, the Web site owner has eliminated content that has not converted well or is outdated.  Should the Web site owner implement 301 redirects for the eliminated content?

Many SEO professionals often state that 301 redirects should be implemented to preserve the “link juice” to the expired content. In this situation, if a searcher clicks on a link to the expired content, he/she will typically be redirected to the home page. How does this benefit the search experience? The searcher expects to be delivered to specific content. Instead, he/she is redirected to a home page to begin searching for the desired content. It is a futile process, as the content has been removed. The result is a negative search experience and a negative user experience.

If content is removed, then delivering a custom 404 page is more appropriate, in spite of the “link juice” theory.

Conclusion

Search usability is not a term that is only applicable to Web search engines. Search usability does not only address querying behavior. It also addresses other search behaviors (browsing, scanning, etc.) Duplicate content delivery often has a negative impact on a site’s overall search usability, before site visitors arrive at your site and after they arrive. By understanding how the commercial Web search engines filter out and display duplicate content, Web site owners can obtain greater search engine visiblity and a better user experience.

Shari Thurow is SEO Director for Omni Marketing Interactive. The 100% Organic column appears Thursdays at Search Engine Land.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: All Things SEO Column | Channel: SEO | SEO: Duplicate Content

Sponsored


About The Author: is the Founder and SEO Director at Omni Marketing Interactive and the author of the books Search Engine Visibility and When Search Meets Web Usability. Shari currently serves on the Board of Directors of the Information Architecture Institute (IAI) and the ASLIB Journal of Information Management. She also served on the board of the User Experience Professionals Association (UXPA).

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.theonlinemarketingguy.com sportsguy

    Excellent article Shari! I think you nailed the points exactly. I’ve had far too many arguments lately against using 301 redirects. My product guys have caught onto them over the last year and now seem to think they are a Golden Arrow capable of magically solving issues they create.

    Imagine their shock when just yesterday I said to break a URL we no longer updated and set up a custom 404 page. They were horrified that i didn’t want to 301 the old page. I said exactly what you did – what’s the point? The user isn’t getting what they expected when they land, so why waste their time?

    Now the arguments are flying around our product team about whether we should create another 404 page or how best to edit the current one to get the most lift from it…LOL

    Man, just get it done. Throw your most popular pages up on the custom 404 page and let users find what interests them.

    It is to laugh… ;)

    I hope folks read this article and take away the core messages. It’s well worth their time.

    Duane

  • MattC

    Good Article Shari :)

  • http://www.vanessafoxnude.com Vanessa

    FYI, I wrote a recap of the Duplicate Content Summit at SMX here:
    http://googlewebmastercentral.blogspot.com/2007/06/duplicate-content-summit-at-smx.html

    I’m torn on the idea of 404ing deleted pages. I suppose it depends on the situation and why the content was removed. If similar content exists elsewhere on the site, it may make sense to redirect the user there, since that would provide some of that they were looking for. If there’s no logical place to redirect, a custom 404 page is probably a good bet, and it would be pretty slick to be able to figure out what they were looking for and provide some context.

    If a visitor is, for instance, trying to access a product line that the site no longer sells, it may make sense to provide a custom 404 page that can detect what URL they were trying to get to, then explains that the product line is no longer available, with links to similar product lines.

  • http://www.highrankings.com/ Jill

    Many SEO professionals often state that 301 redirects should be implemented to preserve the “link juice” to the expired content. In this situation, if a searcher clicks on a link to the expired content, he/she will typically be redirected to the home page. How does this benefit the search experience? The searcher expects to be delivered to specific content. Instead, he/she is redirected to a home page to begin searching for the desired content. It is a futile process, as the content has been removed. The result is a negative search experience and a negative user experience.

    I don’t think most SEO professionals would suggest 301 redirecting old outdated pages to the homepage, but to the new appropriate equivalent.

  • http://www.nuexp.com Nuexp

    I totally agree with Venessa and Jill regarding this issue. It seems like a total waste of all the effort to build all the links for a page, and then to lose all that traffic after just because that content becomes expired/outdated/redundant for some reason. If there is an available related/updated alternative, 301-ing to that related page sounds like a wise thing to do.

    Speaking of which, I am curious to know how many webmasters have started using the “unavailable_after” meta tag. It would be good to read about it here sometime soon.

  • http://www.searchgrit.com/ Marios Alexandrou

    “The searcher expects to be delivered to specific content. Instead, he/she is redirected to a home page to begin searching for the desired content.”

    While Vanessa’s comment to use a smart 404 page is great, the above blurb doesn’t explain how a generic 404 page is any better an experience than the home page. I would think the home page is more likely to have other content of interest to engage the user.

    “It is a futile process, as the content has been removed. The result is a negative search experience and a negative user experience.”

    On the search side, the page redirected to will stop ranking for terms if it isn’t relevant. Google and other SEs will, quite quickly sometimes, rectify any issues with the search experience. If they do happen to think the new page is relevant you’ll have the added bonus of maintaining rankings.

  • WebmasterT

    Shari, nice article! I agree with the coments by Jill and Vanessa, however, lets not forget that the 301 has a function beyond how SE’s use it. 301 re-directs are important for maintaining links to your site. If you change a filename, folder etc. and don’t use a 301 then people that you may not know of who had linked to you or mentioned the url offline or a link wasn’t discovered by a SE then you have totally failed the user and more importantly the person nice enough to link to you or recommend you offline.

    I would suggest to anyone writing a custom application like that mentioned by Vanessa may want to also capture the referrer and record it somewhere for easy monitoring of referrers who may need to be contacted with a request to update the IBL. Then once you see these dwindle to nothing you know you can remove the 301 with confidence that the 301 has performed its function and a 404 is now a more appropriate response.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide