Canonical tags gone wild
What happens when canonical tags get out of control and how can you rein them in? Columnist Patrick Stox shares his findings and insights.
Being a technical SEO, I love digging into any weird problems where things don’t seem to work as expected.
Canonical tags seem easy enough, but these tags cause all sorts of interesting issues — and some minor fixes can lead to big wins. Almost every major website will have some kinds of issues with their canonical tags, so I dug into a few different ones to see what examples I could find.
Canonical tags thrown into the <body>
In my recent post, “Canonical tags are easy, right?,” I gave an example of a canonical tag that looks fine if you view the source code, but if you use “Inspect” in Chrome Dev Tools to view the DOM tree, you’ll see that the <head> section of Home Depot’s website breaks early and the canonical tag is thrown into the <body> section, where Google will ignore it.
What’s the worst that could happen if all of your canonical tags are ignored? You won’t have control over the preferred version or consolidation of signals. Many pages will be indexed with the wrong version, or you may have multiple versions of the same page indexed without consolidating the signals, and no version of that page will rank as high as it should.
Here are a few different searches in Google that show parameters on Home Depot’s website that are getting indexed even though they have a canonical set to the clean URL:
An interesting note is that the canonical tags seem fine on Home Depot’s mobile website. Likely, one of the scripts they are calling on the desktop version of the site is causing the issue, but the problem will resolve itself with the upcoming mobile-first index.
If Home Depot wanted to fix this sooner, they can probably get away with moving the canonical tag in the <head> section so it’s above all the scripts or figure out what is causing the <head> section to close early (which is likely a tag that wasn’t closed properly).
Canonical tags when every version references itself
What happens when you have multiple versions of the same page, and each version has a canonical tag that says it is the correct version? The answer is that Google will choose one, or both, but it probably won’t be consistent.
That’s exactly what happens on Meetup.com. Meetup pages have at least two versions used interchangeably: one with the name of the Meetup in mixed case as it was entered and one all lowercase. Mixed case of any kind works for the URLs on Meetup.com; try making any of the lowercase characters capitalized, or vice versa.
So, if we have two versions that both work, and both say they are correct, what happens?
<link rel=”canonical” href=”https://www.meetup.com/RaleighSEO/” />
<link rel=”canonical” href=”https://www.meetup.com/raleighseo/” />
In this case, both pages are being indexed, but only one will show. Both versions have links, and the equity is currently split. I have added &filter=0 to the Google search in the screen shot below so that I could show that both are indeed indexed. You can also check by doing an info: command for the different URLs to see the canonicalized version.
To recap: Both versions of the page are indexed, both have links, and only one can show. A quick fix from Meetup here could consolidate a lot of signals that are currently split, and they would likely see a large traffic increase.
Forgot to include the canonical tag or included the wrong page
A quick search for “sams club tires” will show you both desktop and mobile versions of the samsclub.com tires page. The problem here is that the m.samsclub.com/tires page does not have a canonical tag at all, allowing both pages to show.
Even if the m. page indexed had the canonical tag in place, I don’t think this one would work correctly — the desktop site references a different mobile page as the alternate (https://m.samsclub.com/cat/tire-search/1056), and that page 302 redirects to the m. page shown in SERPs above (https://m.samsclub.com/tires).
Having the mismatch on alternate versions is a common problem on their site, and m.samsclub.com shows in many desktop search results because they aren’t indicating the connections in the way they need to in order to consolidate the pages.
Without establishing the relationship between desktop and mobile versions of the page, they will be treated as separate entities — both can be shown in search results, and neither will rank as high as they should if this was done correctly.
When I started writing this, it looked like there were also some issues around canonicalization with what appeared to be a dev server. The canonical tags were using the subdomain of the dev server, which was prod-i.samsclub.com. These pages were getting indexed and sometimes being chosen as the version to show, even for the home page of the website.
It looks like they have recently fixed this as they redirected prod-i.samsclub.com to www.samsclub.com, but you can still see many of these pages in the index with a site: search, and their cached versions still show the incorrect canonical tag. If you’re going to expose an environment like that, I’d highly recommend using server-side authentication so search engines won’t be able to crawl it in the first place to avoid problems like this.
Another potential disaster is copying a page and not changing the canonical or accidentally setting a section or even an entire website to canonical to a single page. While some of these will be ignored, others may be respected, and you could see a decline in traffic for many pages.
Canonical tags with URL parameters
There are lots of ways canonical tags can go wrong when you have multiple versions of a page. First up, if you have multiple versions of a page and no canonical, then what happens? That’s right, you get multiple versions indexed.
A more interesting question might be, what happens when you have a separate mobile version that has parameters? When you are connecting, say, an m. mobile site and the desktop version, then you have to specify the alternate version of the page on the desktop site and canonical from the mobile site back to the desktop.
What happens when you only link to one version of the mobile site, but URL parameters make it so there is more than one version of the page? The others get indexed, as they have with this page — site:samsclub.com pretend play inurl:1938.
Did you know there’s also a tool in Google Search Console for handling parameters?
Canonical tags ignored
Remember that canonical tags are a hint, not a directive. They’re made to be used for duplicate versions of pages, and you can get away with nearly duplicate versions in many cases. If the page you set as the canonical is too different from your target page, the canonical will likely be ignored.
This happens with the channels page under YouTube user accounts; just check out site:youtube.com inurl:channels. In some situations, other signals might overpower canonical tags as well. Things like how URLs are submitted in the sitemap and how the pages are internally linked are other signals, and Google also has preferences for things like HTTPS versions and shorter URLs.
Canonical tags with other tags
Canonical tags can have all kinds of issues when used with other tags. I would say don’t point the canonical on page 2 to page 1 in a paginated set, don’t use noindex on pages with a canonical tag, and be very careful with hreflang tags since each page needs to be the indexed version. There are tons of other problems that happen when canonical tags interact with other tags.
Canonical tags and redirects
It’s generally a bad idea to canonical to a page that redirects. This usually breaks something or consolidates signals inconsistently. Take Amazon stores, for instance, where there are lots of redirects and weird canonicalization happening.
Look what happens, and notice that at each step there are pages indexed and that the URLs might use the clean name or the store IDs or both.
- 301 > https://www.amazon.com/gp/shops/shopname?some-parameters=stuff
- 302 > https://sellercentral.amazon.com/gp/sc-redirect/seller-page.html?some-parameters=stuff
- 302 > https://www.amazon.com/gp/browse.html?some-parameters=stuff
- 301 > https://www.amazon.com/gp/node/index.html?some-parameters=stuff
- 301 > https://www.amazon.com/s?some-parameters=stuff
- This last one is the version where many of the Amazon store listings are, but we’re not done yet.
- This page has a canonical set as https://www.amazon.com/s?some-different-parameters=stuff (not the same as the URL).
- That URL does a 302 redirect to https://www.amazon.com/ref=nb_sb_noss_null
- And finally, that page has a canonical set as https://www.amazon.com/.
Canonical tags always create the most interesting issues, and things don’t quite work out the way you would expect. I’d bet that some of these pages end up canonicalizing to that final Amazon home page version and give the home page a bit of a boost.
The point is that the canonical tag is powerful, and it can go wrong easily — so double-check your website to see what kinds of issues there might be.
Check your canonical tags
I found most of my examples in the article with a simple site: search of the domain in Google and maybe removed filtering, as with the Meetup example, or searched for an individual product or just something I saw in one of the title tags to see if there were other versions.
None of these examples took long to find, and I didn’t even use a crawler, but you definitely should use a crawler when looking for issues on your own website. I would expect any major website out there to have more than a few examples of canonical tags gone wild.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
New on Search Engine Land