Note: this article assumes you understand the basics of canonicalization. If you don’t, have a look at my article, 8 Canonicalization Best Practices In Plain English.
I still don’t recommend using rel=canonical. It should be your weapon of absolute last resort.
Last February, the Big 3 search engines announced support for this link tag. A lot of folks danced a happy jig. By inserting a single line of code you could force search engines to resolve duplicate page URLs to a single page.
So, if your site is generating all of these URLs for a single page:
No worries…just put this in the head element of that page and search engines will ignore the duplicates. They’ll even (in some cases) redirect link authority to the right canonical page:
<link rel="canonical" href="http://www.mysite.com/page.php" />
It’s like magic!
I was more skeptical. And I remain skeptical. Wait, there are real reasons! This isn’t just me being me! Keep reading!
It’s too easy
Adam Audette wrote about this last December. It’s so easy to place a rel=canonical tag at the top of a page. One line of code, duplicate content problems fixed. But sometimes, ‘easy’ is bad.
People keep throwing the canonical tag into their sites without learning how it works. The result: They use it wrong, and sometimes royally screw up their web site’s search presence. Here’s a real-world example with names, industries, etc., changed:
A client’s site generates duplicate content due to tracking urls, inconsistent home page linking and other common issues. Their development team decided the canonical tag. That’s a lot easier than fixing each issue. So they added the tag to their next code release. Problem is, they had the tag load the URL of the current page, rather than the address of the correct canonical version. So, if I went to page.php?tracking=2, the canonical tag contained that address.
The result: They actually ended up with more duplicate content.
I’ve seen sites that insert the home page address into the canonical tag, too. Every page on the site has the equivalent of a redirect to www.mysite.com.
In all, half of the rel=canonical implementations I’ve checked for clients are incorrect.
My point: putting rel=canonical in the hands of anyone who can edit a web page is like giving me a Taser. You’d better hope they never use it, because they’ll do more harm than good.
Uneven support by search engines
Yes, all three search engines support rel=canonical. But they all support it differently. Google supports cross-domain use and says the canonical tag is almost a 301 redirect. Bingahoo says it sees rel=canonical as a ‘hint, not a command’.
In my testing, Bing ignores the canonical tag altogether. I don’t think Facebook’s web search supports it, either. And in my testing, Google follows rel=canonical only when there are a few duplicates. If there are hundreds or thousands of duped pages, Googlebot has a nervous breakdown, and you’re stuck with duplicates anyway.
So rel=canonical is more a set of guidelines, I guess?…
My point: Cross-search engine support for rel=canonical is at best inconsistent, and may be nonexistent. Use this tag at your peril.
Messy URL structures and duplication are still bad
Rel=canonical doesn’t make duplication and messy URL structures OK! Search engines haven’t penalized for duplicate content in a long time. But there are still consequences:
- It’s a performance killer. If you have the same page at many different URLs, any caching tools you’re using still have to cache all of those copies. And search engines will still have to somehow hit/access/index those pages. I won’t say ‘crawl’ because I otherwise might get gotcha’d :) .
- It’s messy. Other sites may link to yours at any one of the duplicate URLs. Your own writers/content managers may link to the wrong canonical URLs, too. See the next section to understand why that’s a problem.
- It confuses your visitors. If your URL structure contains all kinds of dross like ?this=234a3245 or ?refer=blahblahblah it makes your page URLs harder to pass around via e-mail. Which, despite assertions to the contrary, is still how much of the world passes around information.
My point: The canonical tag may fix client-side duplication problems, but those problems still exist on your server. Which is bad.
If PageRank passed via 301 redirects decays, I’ll bet authority passed by rel=canonical does, too. So links pointing at the wrong canonical version of a page that has rel=canonical on it may pass less and less PageRank over time.
My tinfoil hat doesn’t like it
My last problem with rel=canonical: I don’t trust client-side software to resolve problems on a web site, ever. Didn’t you all learn your lesson with nofollow? Search engine support for tags changes. Or it’s buggy and inconsistent.
The canonical tag is a client-side fix. It commands search engines to behave differently. In my experience, SEO goes far better if you resolve issues in a way that lets search engines behave normally.
Just fix it
Rel=canonical seems too good to be true. That’s because it is. With inconsistent support, a host of technical problems that remain after use and general confusion over it, the canonical link tag is at best an emergency measure.
The best solution for canonical issues is to fix them, instead.
I promise, after this article I will stop harping on canonicalization. For a little while.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.