Why Canonicalization Matters From A Linking Perspective

Search engine optimization (SEO) can be like any other technical field of study. It is filled with specialized jargon that, to a newbie, can be more than intimidating. I recall that feeling was especially strong when I first encountered the term canonicalization.

It is a 14-letter, seven-syllable monster of a term. I first heard it spoken, and had to ask the person who said it to repeat it. It didn’t help. (It had been a long day!)

The truth of the matter is that canonicalization is not all that complicated to understand if the explanation is lucid. So let’s try to explain what it means, why it’s important, and what it has to do with linking.

What Is Canonicalization?

In mathematics, when the same data can be represented in multiple ways, it is best to standardize that representation by establishing the data’s canonical form, the one primary form in which it will be used. In the computer science field, the act of defining the canonical form of data is called canonicalization.

Simply put, canonicalization defines the one primary way you’ll use to write data, such as a URL string. As webmaster, you can choose which canonical form to use for a given URL on your site, but once selected, the chosen form should always be the way that URL is written.

Why Canonicalization Is Important

Fundamentally, you need to know that search engines do not index pages by their content. They index URLs. The content associated with the indexed URLs is brought in to the search engine database, but URLs are what possess ranking.

What complicates matters in search (and why canonicalization is important) is that the same content page can have multiple URLs associated with it.

I’m not talking about when Web spammers scrape your content and publish it on their own website. I’m talking about variations of URLs on your website all pointing to the same page.

For example, the following hypothetical URLs would likely all point to the same page (in this case, the home page of a site):

  • example.com
  • www.example.com
  • www.example.com/
  • www.example.com/index.html
  • www.example.com/index.html?var1=105
  • www.example.com/index.html?var1=105&var2=abc

As you can see, a valid URL may either include or omit the subdomain prefix “www.”, a trailing slash after the top-level domain, the default webpage name for a folder, and/or one or more URL parameter suffixes (there are even more, but these are the most common). They can also be used in various combinations. The possible permutations of the above examples can quickly add up to a large number of URLs all pointing to the same content page.

And this is not only a problem for home pages. Deep link pages can have the similar problems, such as the following hypothetical examples:

  • www.example.com/folder1/
  • www.example.com/folder1/index.html
  • www.example.com/folder1/index.html?product=49
  • www.example.com/folder1/?userID=tinytim

When search engine crawlers encounter multiple URLs successfully pointing to the same content page, the overall potential PageRank for that content page is split among the URLs crawled. After all, even though the content is exactly the same, each crawled URL will have its own number of backlinks, so the PageRank for a given piece of content will differ among the URLs crawled.

Metaphorically speaking, imagine a full pitcher of water (the total potential page rank) and several empty cups of various sizes (your non-canonicalized URLs).

When you split up the water from the pitcher among the cups, you are technically still working with the same amount of water, but each cup only has a percentage of the total. None of the cups contains as much water as the pitcher could.

When that comes to PageRank, if your site’s pages are not canonicalized, you’re not using your full potential for page ranking. Not only are your URLs competing against those of your rivals from other websites, you are also competing against URL variations within your own website!

Wouldn’t it be better if you could consolidate your page rank in one URL as you might pour all of those cups of water back into one pitcher? That’s why we need to canonicalize our sites.

Canonicalization’s Connection To Linking

“Yeah, yeah, this is all well and good. But where’s the connection to linking,” you ask? Well, as you are a webmaster, you do have a degree of control over how at least some pages link to you.

After all, your intrasite links, not to mention your site navigation scheme links (and for that matter, the links in your XML-based Sitemap file) are all controlled by you.

This means you need to comb through your site (or your content management system, aka CMS) and see how the link to each page is referenced. You need ensure each link to a given page always uses the exact same URL form.

I personally advocate using absolute (aka full) URLs in links, if only because of the plague of content scrapers. As those people are too lazy to create their own content, they are also usually too lazy to examine and change stolen content source code.

If your content is scraped, readers of that content will be brought back to your site when they click the inline links you created (you do create inline links when relevant opportunities appear, right?).

Admittedly, there are times when your site architecture requires that you use URL parameters. In that case, you can also create <link> rel=canonical tags in the <head> section of your pages. The href attribute of this tag will define the canonical URL for the page, so if the URL normally requires URL parameters, the canonical URL is still defined.

Note that search engines have stated they will look at rel=canonical as a hint, not as a mandate. As such, this is not the magic canonicalization bullet for your site. You still need to be consistent with your canonical intrasite linking.

Also, for URL parameter users, be sure to check out both the Google and Bing Webmaster Tools. Both have added options enabling webmasters to define specific URL parameters to be ignored during crawls.

 

Google also allows you to select whether or not you want to use the subdomain prefix “www.” in your preferred URL. I’d guess that option will eventually come to Bing as well.

Lastly, for links you don’t control, such as inbound links from other sites, you can set up 301 permanent redirects for all non-canonical URL forms to the canonical URL for each page.

Just be sure you use a 301 permanent redirect. As the 301 is a permanent redirect, search engines interpret this to mean they can safely transfer all of the page rank value from the original (non-canonical) URL to the new (canonical) one.

Note that while 302 temporary redirects will redirect users to a canonical URL, search engines will not transfer any acquired page rank! (I have written in more detail about using 301 redirects here.)

If you’re really detail-oriented, you could even look at backlink tools, such as the aforementioned search engines’ webmaster tools or a third-party tool such as Open Site Explorer, to see who is linking to you and work with the errant webmasters who are not using your canonical URL in their outbound links.

After all, as good as a 301 redirect is for canonicalization, a redirect also introduces a potential page load speed delay, although that’s not likely as detrimental to your page rank as non-canonicalized URLs)

The bottom line is this: you have the ability to consolidate the PageRank for your content pages into canonical URLs.

Depending upon how badly your multiple URLs are dividing up your PageRank today, given how competitive (not to mention how valuable) top ranking can be for a given query, why wouldn’t you take the steps needed to consolidate the page rank of your content pages into one canonical URL?

Canonicalization may be a seven-syllable monster, but it’s not that complicated, and doing something about it could improve your position in the SERPs.

Image from Shutterstock, used under license.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEO | Link Week Column

Sponsored


About The Author: is an in-house SEO at MSN.com, and was previously part of Microsoft’s Live Search and Bing Webmaster Center teams, serving as the primary contributor to the Bing Webmaster Center blog and then later as an in-house SEO for the Bing content properties. He also randomly adds to his own blog, The SEO Ace.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • https://plus.google.com/116060438179787966184 salsurra

    Great post about canonical tags. They are so important these days and in some cases, I’m using just canonical tags to move page value instead of doing 301s. There is a strong case showing that it’s better to allow Google to crawl pages on your site, but if the page is duplicate or intended for users, then use the canonical tag to tell Google which page to rank instead of consolidating the pages or trying to setup a 301. 301s are good, but don’t always work for every scenario. Canonical tags are great, but again, don’t work in every scenario.

    Scenario 1 – You have two pages that have similar content, but they are needed for users to help them find what they are looking for or have a specific function on the site. Car sites have a lot of this as their are multiple pages for each trim level, but in fact, most of the data is the same. On these pages, you need to still have each of the trim level pages so that users can find the exact details they are looking for, but if the data is the same on each trim level then this is a good case to use canonical so that Google will only rank one of them and will past the value from the other trim level pages into the one overview page. You make the overview the main source of the data and put a canonical tag on each of the trim level pages. This way you give users what they want, and you give search engines what they want and everyone is happy.

    Scenario 2 – You have two pages with similar content and there is no need to for users to find both. Again, looking at a car site you could find a section for pricing, but there also might be the same pricing data on another page for ‘costs of ownership’ and in these pages are the same details that are on the pricing page. This would be duplicate content that is not really needed for users or engines. In this case, it would be a good idea to pick one of the urls, most likely the stronger one with higher PR, and then setup a 301 to move the other page into the good page. This would reduce the number of duplicate or low quality pages in Google, resulting in higher rankings for those pages and the rest of the site.

    To be clear though, its not good to do both. If you 301 page, then Google will remove it and not crawl it again. This means if you have a special canonical tag on the page it will be ignored because the crawlers won’t even get to it. The other issue is that you can use noindex and canonical at the same time. If you noindex a page with a special canonical tag, then Google will ignore it because the bots are not allowed to crawl it. If you want to have a page with a special canonical tag, you must allow Google to crawl the page and read the canonical tag. They will do the heavy lifting after that and there is no need to use noindex. If you don’t want a page to be in Google, then use noindex tag, robots.txt, or 301 to move it out, but if you want to use canonical tags, then you can’t use any of the above because Googlebot has to crawl the page in order for the canonical to work.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide