How to audit canonicalization and ensure it helps, rather than hinders, your rankings

Columnist Brian Harnish discusses in detail canonicalization issues that may not normally be covered in an SEO audit -- and how to effectively address them.

Chat with SearchBot

search-investigate-magnifying-glass-ss-1920

For those who are unaware, “canonicalization” refers to the practice of making sure that for every instance of duplicate content on a site, one version is specified as the “preferred” or “source” URL to the search engines. Basically, you are telling Google, “Of all the URLs that contain this content, this is the URL that you should consider the authority. No other.”

When a proper audit identifying canonicalization issues is not performed, you can run into snags later when Google identifies your site as being a source of duplicate content, which can lead to algorithmic ranking losses, or even manual penalties.

Canonicalization issues generally occur when attempted canonicalization is not executed properly. Following are some common canonical issues that, once resolved, can result in rankings boosts to the site because of consolidated link equity.

Issue: Home page does not canonicalize properly

Many websites wind up with multiple versions of the home page that resolve on different URLs, such as:

  • https://www.domainname.com/
  • https://domainname.com/
  • https://www.domainame.com/index.html

When you have many different versions of the home page — all of which have inbound links pointing to them — this can cause canonicalization issues that will impact rankings.

In order to fix this, choose your preferred home page URL and 301 redirect all the other versions to it. For the www vs. non-www versions, take it a step further by specifying your preferred domain in Google Search Console.

Implementing redirects to the preferred version of the home page will consolidate your link equity, which can potentially enhance your search engine rankings.

Issue: URLs don’t resolve to a single case

This is a big one. URLs that don’t resolve to a single case can result in duplicate URLs, leading to duplicate content issues that put your site at risk.

If you’re a beginner SEO, it is important to consider that duplicate content doesn’t always mean that the same content is duplicated from page to page. It happens quite often that URLs cause duplicate content issues simply by existing in the first place.

Here are some examples of a URL that doesn’t resolve to a single case:

  • https://www.domainname.com/page.html
  • https://www.domainname.com/pAgE.html
  • https://www.domainame.com/PAGE.html
  • https://www.domainname.com/PaGe.html

If you input all of these variations in the address bar of your favorite web browser, they will all bring up the same page. This can become a problem because without proper configuration, Google will spider and index these pages, resulting in non-canonical pages being indexed.

The best way to fix these issues is to 301 redirect all of these URLs to the main canonical URL that you choose. It may be beneficial to perform a server side redirect using Apache, .htaccess, or whatever server technology your server uses, to avoid adding 301 redirects all over the place. Over-redirecting with 301 redirects can also cause problems.

Alternatively, you can also use the rel=”canonical” tag to specify the canonical version of the page. That means putting a tag on the page that looks like this:

<link rel="canonical" href="https://www.domainname.com/page.html" />

Issue: IP address doesn’t canonicalize

In a perfect world, your IP should properly canonicalize back to the main domain name of your site. If it is not, you risk indexation issues because of the inability for search engines to correctly determine which of your website’s URLs they want to index. In addition, duplicate content issues can arise from a search engine wanting to index both your IP address and the URL of the website.

If you determine that you have IP canonicalization issues, speak with your server administrator and discuss potential solutions to the issues.

Issue: duplicate URLs

Duplicate URLs can be just as dangerous as non-canonical URLs. When you have duplicate URLs, and they have no canonicalization in place, Google will have no idea which version to index. This can lead to the indexation of duplicate URLs serving the same content, diluting your link equity and page authority.

Duplicate URLs can take the form of the following, usually resulting in multiple versions of the same URL regardless of the file name extension:

  • https://www.domainname.com/page.html
  • https://www.domainname.com/page.htm
  • https://www.domainname.com/page.aspx
  • https://www.domainname.com/page/

The best way to fix all of these is to implement a sitewide redirect redirecting all of the duplicate URLs back to the main canonical URL. This will help consolidate link equity and result in a performance boost overall.

Issue: URLs can be accessed through both secure (https) and non-secure (http) versions

This appears to be a pretty simple problem, but you would be surprised how often this creeps up in website audits. Usually this results from improper setup of the non-secure and secure version on the server. Google Search Console does not play a role in terms of how URLs are accessed via the browser.

The best, simplest way to determine this issue is trying to access both the https:// and https:// versions of your site in the browser. If they both load just fine, then you have some issues that should be cleaned up as quickly as possible.

The best way to avoid this problem is to make sure you properly make the switch from HTTP to HTTPS to begin with. (Patrick Stox has written an excellent and comprehensive guide on how to do that here.)

It is my recommendation to get the proper highest-level secure certificate you can, and make sure it comes with wild card options. This way, you do not cause any unforeseen canonicalization issues arising from not having the proper secure certificate with the right settings installed.

Issue: trailing slash canonicalization

Similar to duplicate URLs, improper trailing slash canonicalization can also become an issue. For example:

  • https://www.domainname.com
  • https://www.domainname.com/

or

  • https://www.domainname.com/page-name/
  • https://www.domainname.com/page-name

If you have been promoting your website using versions of your URLs both with and without the trailing slash, you could cause indexation and duplicate content issues as a result. Choose one format (I recommend the non-trailing slash version) and stick with it in all of your link building and other promotional efforts. Then do the following:

  1. 301 redirect all variations of the URL using a wildcard redirect back to the canonical URL, and/or
  2. Set the canonical tag to always point to the non-trailing slash version of the page.

The redirect is the preferred solution, but using both is the best option because it removes any ambiguity on Google’s part.

Dive deeper into your audit to find major issues

Canonicalization issues can be a major source of headaches for many SEOs, but if you dive deep enough, you can find and fix many issues plaguing your client’s site. In addition, focusing on these areas can give you a great performance boost that you may not otherwise have been able to obtain with just the usual on-page SEO.

This is because canonicalization factors impact link equity, which, when managed properly, can translate into a major performance boost for your website.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Brian Harnish
Contributor
Brian started in web development in 1998 when search engines like Altavista and Yahoo ruled the industry, and Google was just getting started. In 2007, he made his leap into SEO professionally. He has performed SEO for law firms, real estate agents, technology, and healthcare sectors. His work also includes large brands like United Healthcare and Microsoft.

Get the must-read newsletter for search marketers.