Canonicalization sounds like a process for recognizing sainthood, or maybe a training course in aiming large projectile weapons. But it’s actually one of the most important aspects of organic SEO. Good canonicalization means search engines crawl more pages of your site; it means that link authority and PageRank get consolidated, so you have a stronger link profile; and it means fewer broken links from other sites. Bad canonicalization gets you all that stuff, but with the opposite effect.
The Ian-Lurie-mangles-the-meaning-so-computer-geeks-cringe-definition of canonicalization is: “every resource on your web site has a single web address.”
Every resource means every page, every image, every video, etc..
Single web address means there’s only one Uniform Resource Locator (URL) for each page of content, image, video, etc..
A URL looks like this:
Or, it could be: http://www.mysite.com/blah/foo.html.
Or, it could be: http://www.mysite.com/blah/foo.php?meh=123.
Or… Oh, you get the idea.
Note that I said ‘page of content’. That means that a single article, product description or list of articles should appear at a single URL. You should never have multiple URLs for, say, one product description, or one article.
Some of the absurdly bloated content management systems and e-commerce suites out there make canonicalization a challenge. But it’s worth it.
Consequences of bad canonicalization
Here’s an example of ‘bad’ canonicalization: Let’s say I’ve opened a games store: Ian’s Nerdvana (I owe Dave Barry for the term ‘nerdvana’). My store’s home page lives at:
But it also lives at
So what? People will find the home page at all three versions. They won’t know the difference, right? Well, yeah. But search engines will. Googlebot sees the three above URLs as three different pages on the web. That has two consequences that hurt SEO.
First, you lose link authority. If blogger 1 comes to ‘www.iansnerdvana.com’ and links to that page, blogger 2 lands on ‘iansnerdvana.com’ and links to that URL, and blogger 3 lands on ‘www.iansnerdvana.com/index.html’ and links to that page, Googlebot sees three links to three different pages, and applies 1 ‘vote’ to each one. These three links could have sent three authoritative signals to Googlebot for my site’s home page. Instead, they’re split into three weaker individual votes for three different pages. It’s as if Ross Perot or Ralph Nader were sitting in front of my site, siphoning off votes. It’s link love mayhem.
If I weren’t such a loser, I would’ve set up my site so that my home page ‘lived’ at one unique URL – ‘www.iansnerdvana.com’. Then all 3 bloggers would have linked to that page, and Googlebot would instead apply all three votes to a single page. If I care about link authority – and who doesn’t, I ask you? – then that’s a far better outcome.
Secondly, search engines won’t crawl your site as deeply as they might. Search engines allocate resources for each crawl. No one knows exactly how, but it’s safe to say Googlebot won’t just wander around your site until its found every page. At some point, it gives up and leaves. If multiple pages on my site have multiple URLs, then visiting search bots waste time tracking down all of those different versions. That’s time they could spend crawling other unique pages, instead. So fewer unique pages of my site end up in the search index, and I have fewer chances to rank.
Don’t feel bad, though. Even SEO agencies screw it up. Here’s one with their home page at both ‘www.site.com/’ and ‘www.site.com/index.php’. Oops:
You can avoid the heartbreak of bad canonicalization, or at least minimize it, by doing a few simple things:
- Use 301 redirection to ensure that your home page is only found at one URL. If you don’t know how, read Stephan Spencer’s column about rewrites and redirects.
- Link consistently to your home page from within your own site. Use a single URL for your home page. Don’t mix in instances of ‘www.iansnerdvana.com/index.html’ with ‘www.iansnerdvana.com’. If you aren’t doing this properly right now, a quick change may have a big impact on SEO.
- Don’t use tracking IDs in internal site navigation. A lot of sites add stuff like ‘?source=blog’ in their navigation. That lets them use their analytics reports to track user movement within, to and from their site. Instead, learn to use your web analytics referrer and navigation path reports. If you must use tracking IDs, change your software to use a hash mark (a ‘#’ sign) instead of a question mark. Search engines ignore everything after the hash, so you’ll avoid confusion.
- Don’t use tracking IDs in organic links from other sites. If you get a link on another site, and want it to help with your SEO, don’t put a tracking ID in that, either.
- Be careful with pagination. Many sites have pagination, where visitors can click a 1, 2, 3 etc. to jump to later pages in search results, product lists or articles. That’s fine, but make sure that the each page has a single URL. For example, if page 1 of the article is ‘www.iansnerdvana.com/article.html’ when I click the article link from the home page, make sure that the number ’1′ in the pagination takes me there, too, instead of to ‘www.iansnerdvana.com/article.html?page=1′.
- Set up preventative redirects. Make sure that ‘iansnerdvana.com’ 301 redirects to ‘www.iansnerdvana.com’.
- Exclude ‘e-mail a friend’ pages. Most content management systems that have ‘e-mail a friend’ options direct the user to a unique page that has the same form and content. But every instance of that page has a unique URL like ‘ID=123′, to tell the server which product or article to forward. It’s canonical higgeldy-piggeldy. Use robots.txt and the meta robots tag to exclude these from search engine crawls.
- Use common sense when building your site. Think, man/woman! If you need to change the header, footer or other page element based on where on your site the visitor came from, do it with cookies, or by sniffing out the referring URL. Design to do this ahead of time.
What about rel=canonical?
The canonical tag is a neat little gadget that’s supposed to let you tell search engines the correct URL for any page. So, by adding <link rel=”canonical” href=”http://www.iansnerdvana.com/”> to any page, I could tell visiting search bots to index just that version, and to direct all link authority to that one URL. It sounds ideal.
It’s not. First, Yahoo! and Bing don’t yet have confirmed support for it. Second, you can’t rely on tags of this nature, as search engines may change their minds later. Google’s done it. So don’t stake your SEO strategy on it. Third, why not do it right the first time? In addition to SEO benefits, a canonically clean site should run faster, present fewer maintenance headaches and place less load on server and bandwidth resources.
Let’s get canonical!
So, get out there and start cleaning up your site. Canonicalization fixes are generally simple, have a broad impact and let you fix multiple SEO problems at once. You’ll get more link authority, deeper site crawls and better rankings. What’s not to love?
- URL Rewrites & Redirects: The Gory Details (Part 1 of 2) by Stephan Spencer
- URL Rewrites and Redirects: The Gory Details (Part 2 of 2) by Stephan Spencer
- Google Lets You Tell Them Which URL Parameters To Ignore by Stephan Spencer
- Enterprise SEO Tools, Part 1: The Browser by Adam Audette
- Canonical Tag 2.0: Google To Add Cross Domain Support by Vanessa Fox
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.