• http://www.bazaarvoice.com naja2183

    Thanks for the post and suggestions – Would you say that following these steps are a higher priority than steps to reducing your page speed? Just curious.

    “Canonicalization fixes are generally simple” – this hasn’t been the case in sites I work with. These types of fixes generally take time and have to be wedged in with other projects that are happening to the same piece of code to get done.

  • http://www.seobocaraton.com SEOBocaRaton

    Google also sees

    http://www.iansnerdvana.com/index.html
    and
    http://www.iansnerdvana.com/index.HTML
    as different pages too.

    .NET sites have this issue too, default.aspx and Default.aspx as the home page can both get indexed.

  • dyoungprod

    Actually, Yahoo and Bing do support the canonical tag. It’s the cross domain version that they don’t support. http://searchengineland.com/canonical-tag-16537

  • Ian Lurie

    I’d like something besides their statement that they actually support it, ’cause every test I’ve done shows they don’t. That’s the problem, really – all of the major search engines jump in to endorse a new bit of markup, but then they support it only inconsistently. It’s like playing HTML Roulette.

  • outtanames999

    SEOs beware. Canonicalization is just more seo Cuttsfud brought to you by immature search engine algorithms and is another tail-chasing waste of time for SEOs like nofollow. A year from now, you will not need to waste your time doing this because they will suddenly announce its not an issue – AND NEVER WAS.

    Search engines know the score: We were here before they were. They inherited the same web standards we did. And those standards say it is perfectly OK to have domainname.com, and http://www.domainname.com, domainname.com and domainname.com/, etc. Everybody knows they’re all the same. HUMANS can figure it out, so can the search engines.

    And 99.9999999% of web sites resolve all forms to the same content. If fact, I defy anyone to name more than a few obscure websites where this is not the case.

    Any search engine full of phds that tells you they can’t figure out the difference between domainname.com and http://www.domainname.com is lying. Just like they lied about not being able to parse javascript, dated content, media types, etc. etc. etc.

    They’re playing you for fools and suckers. Smart SEOs will not fall for it.

  • http://www.cicadamania.com Cicada Mania

    I see rel=”canonical” as an absolutely essential invention. Google does support it, and Bing is just waiting for the point when they’re supplying Yahoo’s results, and then they’ll release support with their next release.

    I NEED this tag because 1) I cannot control how the public links to my pages, and 2) because of all the different types of tracking parameters my marketing friends want to attach to my URLs. Commission Junction, WebTrends, Google Analytics — all kinds of crazy query string add-ons that my marketing buddys insist they need on my URLs to track their campaigns, which also trash SEO. That’s why we need search engines to support rel=”canonical”. We can do our best to enforce canonical linking of our pages, but we can’t stop the rest of the universe from screwing it up for us.

  • Ian Lurie

    I hear ya. I end up using it, too. But it’s always best to start out with the assumption that it’s a tool of last resort.

  • http://www.smallbusinessonline.net NeilS

    You don’t mention using the function in Google Webmaster Tools which lets you “set your preferred domain.” Doesn’t that go a long way toward solving the problem for most sites?

  • http://dineshthakursem.blogspot.com Dinesh Thakur

    A great post with useful “rel=canonical” information, don’t know about it earlier. Thanks Ian !

  • Ian Lurie

    @Neil good catch. I must’ve been writing too fast, or thinking too slow, or something. Clearly you want to use Google Webmaster Tools, as well. It doesn’t help with non-Google search engines, but it’s still helpful.

  • http://andybeard.eu AndyBeard

    “Use robots.txt and the meta robots tag to exclude these from search engine crawls.”

    Gotcha

    http://andybeard.eu/1121/seo-linking-gotchas-even-the-pros-make.html

    It has actually been 3 years now since Matt Cutts told Eric Enge about pages blocked by robots.txt accumulating PageRank.
    http://www.stonetemple.com/articles/interview-matt-cutts.shtml

    But even before then it was obviously happening.
    Google (well Matt) still hasn’t elaborated on reset vectors, but any juice being sprayed around randomly is a bad thing.

    Google have to be able to crawl a page to discount it as duplicate content, or for them to see noindex

  • http://www.concept-i.dk/ Thomas Rosenstand

    Exactly Andy! It is a common mistake that Google does not index URLs blocked by robots.txt. They do that all the time. They don’t crawl it though – but they index it.

    But beside that: This article should be mandatory reading for every web developer in the wworld!