Google, Yahoo & Microsoft Unite On “Canonical Tag” To Reduce Duplicate Content Clutter


The web is full of duplicate content. Search engines try to index and display the original or “canonical” version. Searchers only want to see one version in results. And site owners worry that if search engines find multiple versions of a page, their link credit will be diluted and they’ll lose ranking.

Today, Google, Yahoo and Microsoft (links are to their separate announcements) have united to offer a way to reduce duplicate content clutter and make things easier for everyone. Webmasters rejoice! Worried about duplicate content on your site? Want to know what “canonical” means? Read on for more details.

Multiple URLs, one page

Duplicate content comes in different forms, but a major scenario is multiple URLs that point to the same page. This can come up for lots of reasons. An ecommerce site might allow various sort orders for a page (by lowest price, highest rated…), the marketing department might want tracking codes added to URLs for analytics. You could end up with 100 pages, but 10 URLs for each page. Suddenly search engines have to sort  through 1,000 URLs.

This can be a problem for a couple of reasons.

  • Less of the site may get crawled. Search engine crawlers use a limited amount of bandwidth on each site (based on numerous factors). If the crawler only is able to crawl 100 pages of your site in a single visit, you want it to be 100 unique pages, not 10 pages 10 times each.
  • Each page may not get full link credit. If a page has 10 URLs that point to it, then other sites can link to it 10 different ways. One link to each URL dilutes the value  the page could have if all 10 links pointed to a single URL.

Using the new canonical tag

Specify the canonical version using a tag in the head section of the page as follows:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>

That’s it!

  • You can only use the tag on pages within a single site (subdomains and subfolders are fine).
  • You can use relative or absolute links, but the search engines recommend absolute links.

This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.

  • Links to all URLs will be consolidated to the one specified as canonical.
  • Search engines will consider this URL a “strong hint” as to the one to crawl and index.

Canonical URL best practices

The search engines use this as a hint, not as a directive, (Google calls it a “suggestion that we honor strongly”) but are more likely to use  it if the URLs use best practices, such as:

  • The  content rendered for each URL is very similar or exact
  • The canonical URL is the shortest version
  • The URL uses easy to understand parameter patterns (such as using ? and %)

Can this be abused by spammers? They might try, but Matt Cutts of Google told me that the same safeguards that prevent abuse by other methods (such as redirects) are in place here as well, and that Google  reserves the right to take action on sites that are using the tag to manipulate search engines and violate search engine guidelines.

For instance, this tag will only work with very similar or identical content, so you can’t use it to send all of the link value from the less important pages of your site to the more important ones.

If tags conflict (such as pages point to each other as canonical, the URL specified as canonical redirects to a non-canonical version, or the page specified as canonical doesn’t exist), search engines will sort things out just as they do now, and will determine which URL they think is the best canonical version.

The tag in action

This tag will most often be useful in the case of multiple URLs pointing at the same page, but might also be used when multiple versions of a page exist. For instance, wikia.com is using the tag for previous revisions of a page. Both http://watchmen.wikia.com/index.php?title=Comedian%27s_badge&diff=4901&oldid=4819 and http://watchmen.wikia.com/index.php?title=Comedian%27s_badge&diff=5401&oldid=4901reference the latest version of the article (http://watchmen.wikia.com/wiki/Comedian%27s_badge) as the canonical.

The search engines stress that it’s still important to build good URL structure and also note that if you aren’t able to implement this tag, they’ll still keep the processes they have now to determine the canonical. For instance, at SMX West on Tuesday, Maile Ohye of Google explained how Google can detect patterns in URLs if they use standard parameters. For instance, with these URLs:

  • http://www.example.com/buffy?cat=spike
  • http://www.example.com/buffy?cat=spike&sort=evil
  • http://www.example.com/buffy?cat=spike&sort=good

Maile explained that Google can detect (particularly when looking at patterns across the site) that the sort parameter may order the page differently, but that the URLs with the sort parameter display the same  content as the shorter URL (http://www.example.com/buffy?cat=spike).

While it’s rare for the search engines to join forces, this isn’t the first time they’ve come together on a standard. In November 2006, they came together to support sitemaps.org. And in June 2008 they announced a standard set of robots.txt directives. Matt Cutts of Google and Nathan Buggia of Microsoft told me that they want to help reduce the clutter on the web, and make things easier for searchers as well as site owners.

This new tag won’t completely solve duplicate issues on the web, but it should help make things quite a bit easier particuarly for ecommerce sites, who likely need all the help they can get in the current economic conditions. Site owners have been asking for help with these issues for a really long time so this should be a greatly welcomed addition.

Postscript by Barry Schwartz:

The search engines will be talking about this news at the Ask the Search Engines panel at SMX West. We will be blogging this panel live at the Search Engine Roundtable.



Vanessa Fox is a Contributing Editor at Search Engine Land. Called a “cyberspace visionary” by Seattle Business Monthly, she is an expert in understanding customer acquisition from organic search. She shares her perspective on how this impacts marketing and user experience at ninebyblue.com and provides authoritative search-friendly design patterns for developers at janeandrobot.com.

See more articles by Vanessa Fox >


Share, Bookmark & Discuss This Article
More:


Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter


See more stories like this in the Members Library! Check out the Features: General, Google: SEO, How To: SEO, Microsoft: Bing, SEO: Duplicate Content, Top News, Yahoo: SEO sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!

13 COMMENTS ON Google, Yahoo & Microsoft Unite On “Canonical Tag” To Reduce Duplicate Content Clutter

jdevalk,

I’ve got WordPress, Drupal and Magento plugins / modules ready for this, you can find them here: http://yoast.com/canonical-url-links/



Marketing Lane,

That was some quick work jdevalk.



divinewrite,

Excellent news! Good work G, Y & M! And nice post, Vanessa. Also, good onya Yoast. Very quick response with the plugin! Looking forward to working with it.



Mikkel deMib Svendsen,

The sad thing about this new tag is that I know how developers will react: Great, now we can continue to make crappy site architecture and then just add that tag to solve everything for us! I have already seen that on some developer forums.

The fact just is that the only way for you to keep full control is by making a good, solid and one dimensional site architecture- just as it always has been.

You don’t get a new car by adding a cheap layer of paint!



macgizmoguy,

Well, for those of us ‘Little Fish’ who aren’t SEO Sharks or Pro-Blogger Whales… this should help my handful of sites A LOT in the months ahead.

When I didn’t have truly good site Stats or Analytics in place, I often used ?ID= tracking URL’s to ascertain where traiffic was coming from — and was rather inconsistent using http://www.-prefix or not. The ‘dilution’ this has caused shows up pretty clearly in MSN Webmaster Tools in particular, followed by Yahoo SiteExplorer, and as always: The Google Monster’s GENTLY EVIL EYE has always seemed the smartest about the discrepancies… :)

So bring on the Canonical Smart Bots! It promises a ‘cleaner view’ of the internet for all of us.



Barrie Adams,

This is very useful for the larger sites which I manage, we are always suffering from dupe content issues due to search and archive parameters and am looking forward to seeing this tried and tested.

Thank you Vanessa for the informative post.



William Alvarez,

I applause this initiative from the big G, Y! and MSN (aka Live), as Vanessa mentioned before, it’s a big help for e-commerce sites that live with this issue and becomes time consuming and expensive to support and solve.



maconrocks,

That’s all well and good but it does nothing for the amount of simply unusable and unreliable content that is out there. I am still curious to see how someone can find a middle ground between human powered and mathematical search to provide some sort of filtering based on knowledge and experience. The only reliable start I have seen so far has been sweetsearch.com



webconnoisseur,

Cheers to the search engines for making this happen. Still lots of reason to not rely on this method, but it certainly makes some clean up jobs much easier.



dk,

Always welcome news to see that everyone is playing nice together- Big corps like Google usually just want to swallow up all the competition. At least for this brief announcement we can all sit a bit better with ourselves.

Great post, and thanks for including the tags in there with example. Should be a great help!

Cheers!



diamara,

Over at Wahanda, we’re having to modify quite a few different pages to support this, so we’ve created a Firefox plugin to help make the canonical URI more visible. You can download the extension here:

http://www.wahanda.com/inspire/canonical-uri-extension-for-firefox



fletchgqc,

Great idea to create this tag. Chances are that if you have simple duplicate content that Google is already handling it OK (see http://www.saltwebsites.com/blog/canonical-url-tag-no-mad-rush-implement), but it’s certainly nice to be able to definitively tell them about it and stop worrying.



AnthonyDeegan,

It was interesting to hear Matt Cutts interviewed on this, one of the ways Google deals with duplicate content is by use of the sitemap, the xml version that you provide them with. It’s definitely worth doing properly simply because it also allows you to set the importance of the pages to you, using a sliding scale.




RECENT COMMENTS

  • Robert said " There are smart phones that have RSS feed option in them. So even if your site is not mobile friendl"
  • zanad said " How do these Google changes compare to the features Bing launched this week?"
  • Maggie@losasso said " Great post! I think there will definitely be some interesting changes in online news coverage. It wi"

See All »


FREE DAILY SEARCH NEWS RECAP!

Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›

STAY CURRENT THROUGHOUT THE DAY

RSS Feeds

The Search Engine Land feed keeps you informed as news happens. SEE ALL FEEDS »

Upcoming Search Engine Land Conferences

Advertise With Us »

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.


SMX Web Site » | SMX Difference » | SMX News »


Join us at an upcoming SMX event:

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:


See more webcast topics »

TRACK US SOCIALLY
Upcoming Search Engine Land Conferences

Get Your Search Engine Land
Premium Membership!

Become a premium member today and receive:

  • Express commenting privileges & photo.
  • Exclusive videos & newsletters.
  • Discounts to our SMX conferences.
  • Access to "How To" & Other Archives.

Learn More

Upcoming Search Engine Land Conferences
Add to GoogleAdd to My Yahoo!Add to BloglinesAdd to NetvibesAdd to Windows Live