Canonicalization sounds like a process for recognizing sainthood, or maybe a training course in aiming large projectile weapons. But it’s actually one of the most important aspects of organic SEO. Good canonicalization means search engines crawl more pages of your site; it means that link authority and PageRank get consolidated, so you have a stronger link profile; and it means fewer broken links from other sites. Bad canonicalization gets you all that stuff, but with the opposite effect.

Canonicalization defined

The Ian-Lurie-mangles-the-meaning-so-computer-geeks-cringe-definition of canonicalization is: “every resource on your web site has a single web address.”

Every resource means every page, every image, every video, etc..

Single web address means there’s only one Uniform Resource Locator (URL) for each page of content, image, video, etc..

A URL looks like this:

http://www.mysite.com/

Or, it could be: http://www.mysite.com/blah/foo.html.

Or, it could be: http://www.mysite.com/blah/foo.php?meh=123.

Or… Oh, you get the idea.

Note that I said ‘page of content’. That means that a single article, product description or list of articles should appear at a single URL. You should never have multiple URLs for, say, one product description, or one article.

Some of the absurdly bloated content management systems and e-commerce suites out there make canonicalization a challenge. But it’s worth it.

Consequences of bad canonicalization

Here’s an example of ‘bad’ canonicalization: Let’s say I’ve opened a games store: Ian’s Nerdvana (I owe Dave Barry for the term ‘nerdvana’). My store’s home page lives at:

http://www.iansnerdvana.com/

But it also lives at

http://iansnerdvana.com/

and

http://www.iansnerdvana.com/index.html

So what? People will find the home page at all three versions. They won’t know the difference, right? Well, yeah. But search engines will. Googlebot sees the three above URLs as three different pages on the web. That has two consequences that hurt SEO.

First, you lose link authority. If blogger 1 comes to ‘www.iansnerdvana.com’ and links to that page, blogger 2 lands on ‘iansnerdvana.com’ and links to that URL, and blogger 3 lands on ‘www.iansnerdvana.com/index.html’ and links to that page, Googlebot sees three links to three different pages, and applies 1 ‘vote’ to each one. These three links could have sent three authoritative signals to Googlebot for my site’s home page. Instead, they’re split into three weaker individual votes for three different pages. It’s as if Ross Perot or Ralph Nader were sitting in front of my site, siphoning off votes. It’s link love mayhem.

If I weren’t such a loser, I would’ve set up my site so that my home page ‘lived’ at one unique URL – ‘www.iansnerdvana.com’. Then all 3 bloggers would have linked to that page, and Googlebot would instead apply all three votes to a single page. If I care about link authority – and who doesn’t, I ask you? – then that’s a far better outcome.

Secondly, search engines won’t crawl your site as deeply as they might. Search engines allocate resources for each crawl. No one knows exactly how, but it’s safe to say Googlebot won’t just wander around your site until its found every page. At some point, it gives up and leaves. If multiple pages on my site have multiple URLs, then visiting search bots waste time tracking down all of those different versions. That’s time they could spend crawling other unique pages, instead. So fewer unique pages of my site end up in the search index, and I have fewer chances to rank.

Don’t feel bad, though. Even SEO agencies screw it up. Here’s one with their home page at both ‘www.site.com/’ and ‘www.site.com/index.php’. Oops:

Busted: SEO firm with canonicalization problems on their home page. Best practices

You can avoid the heartbreak of bad canonicalization, or at least minimize it, by doing a few simple things:

  1. Use 301 redirection to ensure that your home page is only found at one URL. If you don’t know how, read Stephan Spencer’s column about rewrites and redirects.
  2. Link consistently to your home page from within your own site. Use a single URL for your home page. Don’t mix in instances of ‘www.iansnerdvana.com/index.html’ with ‘www.iansnerdvana.com’. If you aren’t doing this properly right now, a quick change may have a big impact on SEO.
  3. Don’t use tracking IDs in internal site navigation. A lot of sites add stuff like ‘?source=blog’ in their navigation. That lets them use their analytics reports to track user movement within, to and from their site. Instead, learn to use your web analytics referrer and navigation path reports. If you must use tracking IDs, change your software to use a hash mark (a ‘#’ sign) instead of a question mark. Search engines ignore everything after the hash, so you’ll avoid confusion.
  4. Don’t use tracking IDs in organic links from other sites. If you get a link on another site, and want it to help with your SEO, don’t put a tracking ID in that, either.
  5. Be careful with pagination. Many sites have pagination, where visitors can click a 1, 2, 3 etc. to jump to later pages in search results, product lists or articles. That’s fine, but make sure that the each page has a single URL. For example, if page 1 of the article is ‘www.iansnerdvana.com/article.html’ when I click the article link from the home page, make sure that the number ’1′ in the pagination takes me there, too, instead of to ‘www.iansnerdvana.com/article.html?page=1′.
  6. Set up preventative redirects. Make sure that ‘iansnerdvana.com’ 301 redirects to ‘www.iansnerdvana.com’.
  7. Exclude ‘e-mail a friend’ pages. Most content management systems that have ‘e-mail a friend’ options direct the user to a unique page that has the same form and content. But every instance of that page has a unique URL like ‘ID=123′, to tell the server which product or article to forward. It’s canonical higgeldy-piggeldy. Use robots.txt and the meta robots tag to exclude these from search engine crawls.
  8. Use common sense when building your site. Think, man/woman! If you need to change the header, footer or other page element based on where on your site the visitor came from, do it with cookies, or by sniffing out the referring URL. Design to do this ahead of time.

What about rel=canonical?

The canonical tag is a neat little gadget that’s supposed to let you tell search engines the correct URL for any page. So, by adding <link rel=”canonical” href=”http://www.iansnerdvana.com/”> to any page, I could tell visiting search bots to index just that version, and to direct all link authority to that one URL. It sounds ideal.

It’s not. First, Yahoo! and Bing don’t yet have confirmed support for it. Second, you can’t rely on tags of this nature, as search engines may change their minds later. Google’s done it. So don’t stake your SEO strategy on it. Third, why not do it right the first time? In addition to SEO benefits, a canonically clean site should run faster, present fewer maintenance headaches and place less load on server and bandwidth resources.

Let’s get canonical!

So, get out there and start cleaning up your site. Canonicalization fixes are generally simple, have a broad impact and let you fix multiple SEO problems at once. You’ll get more link authority, deeper site crawls and better rankings. What’s not to love?

Related reading

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: All Things SEO Column | Channel: SEO

Sponsored


About The Author: is Chief Marketing Curmudgeon and President at Portent, Inc, a firm he started in 1995. Portent is a full-service internet marketing company whose services include SEO, SEM and strategic consulting.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.bazaarvoice.com naja2183

    Thanks for the post and suggestions – Would you say that following these steps are a higher priority than steps to reducing your page speed? Just curious.

    “Canonicalization fixes are generally simple” – this hasn’t been the case in sites I work with. These types of fixes generally take time and have to be wedged in with other projects that are happening to the same piece of code to get done.

  • http://www.seobocaraton.com SEOBocaRaton

    Google also sees

    http://www.iansnerdvana.com/index.html
    and
    http://www.iansnerdvana.com/index.HTML
    as different pages too.

    .NET sites have this issue too, default.aspx and Default.aspx as the home page can both get indexed.

  • dyoungprod

    Actually, Yahoo and Bing do support the canonical tag. It’s the cross domain version that they don’t support. http://searchengineland.com/canonical-tag-16537

  • Ian Lurie

    I’d like something besides their statement that they actually support it, ’cause every test I’ve done shows they don’t. That’s the problem, really – all of the major search engines jump in to endorse a new bit of markup, but then they support it only inconsistently. It’s like playing HTML Roulette.

  • outtanames999

    SEOs beware. Canonicalization is just more seo Cuttsfud brought to you by immature search engine algorithms and is another tail-chasing waste of time for SEOs like nofollow. A year from now, you will not need to waste your time doing this because they will suddenly announce its not an issue – AND NEVER WAS.

    Search engines know the score: We were here before they were. They inherited the same web standards we did. And those standards say it is perfectly OK to have domainname.com, and http://www.domainname.com, domainname.com and domainname.com/, etc. Everybody knows they’re all the same. HUMANS can figure it out, so can the search engines.

    And 99.9999999% of web sites resolve all forms to the same content. If fact, I defy anyone to name more than a few obscure websites where this is not the case.

    Any search engine full of phds that tells you they can’t figure out the difference between domainname.com and http://www.domainname.com is lying. Just like they lied about not being able to parse javascript, dated content, media types, etc. etc. etc.

    They’re playing you for fools and suckers. Smart SEOs will not fall for it.

  • http://www.cicadamania.com Cicada Mania

    I see rel=”canonical” as an absolutely essential invention. Google does support it, and Bing is just waiting for the point when they’re supplying Yahoo’s results, and then they’ll release support with their next release.

    I NEED this tag because 1) I cannot control how the public links to my pages, and 2) because of all the different types of tracking parameters my marketing friends want to attach to my URLs. Commission Junction, WebTrends, Google Analytics — all kinds of crazy query string add-ons that my marketing buddys insist they need on my URLs to track their campaigns, which also trash SEO. That’s why we need search engines to support rel=”canonical”. We can do our best to enforce canonical linking of our pages, but we can’t stop the rest of the universe from screwing it up for us.

  • Ian Lurie

    I hear ya. I end up using it, too. But it’s always best to start out with the assumption that it’s a tool of last resort.

  • http://www.smallbusinessonline.net NeilS

    You don’t mention using the function in Google Webmaster Tools which lets you “set your preferred domain.” Doesn’t that go a long way toward solving the problem for most sites?

  • http://dineshthakursem.blogspot.com Dinesh Thakur

    A great post with useful “rel=canonical” information, don’t know about it earlier. Thanks Ian !

  • Ian Lurie

    @Neil good catch. I must’ve been writing too fast, or thinking too slow, or something. Clearly you want to use Google Webmaster Tools, as well. It doesn’t help with non-Google search engines, but it’s still helpful.

  • http://andybeard.eu AndyBeard

    “Use robots.txt and the meta robots tag to exclude these from search engine crawls.”

    Gotcha

    http://andybeard.eu/1121/seo-linking-gotchas-even-the-pros-make.html

    It has actually been 3 years now since Matt Cutts told Eric Enge about pages blocked by robots.txt accumulating PageRank.
    http://www.stonetemple.com/articles/interview-matt-cutts.shtml

    But even before then it was obviously happening.
    Google (well Matt) still hasn’t elaborated on reset vectors, but any juice being sprayed around randomly is a bad thing.

    Google have to be able to crawl a page to discount it as duplicate content, or for them to see noindex

  • http://www.concept-i.dk/ Thomas Rosenstand

    Exactly Andy! It is a common mistake that Google does not index URLs blocked by robots.txt. They do that all the time. They don’t crawl it though – but they index it.

    But beside that: This article should be mandatory reading for every web developer in the wworld!

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide