Do You Have Duplicate Content Issues Across Domain? Google Will Now Alert You

Today, Google webmaster tools has launched a new message alert to let site owners know when a particular URL doesn’t appear because Google sees it as duplicate of a URL on a different domain. In the blog post announcing the feature and in an in-depth help topic, they provide details on how they identify duplicate […]

Chat with SearchBot

Today, Google webmaster tools has launched a new message alert to let site owners know when a particular URL doesn’t appear because Google sees it as duplicate of a URL on a different domain. In the blog post announcing the feature and in an in-depth help topic, they provide details on how they identify duplicate clusters of content and choose a “canonical” version of that cluster to display in search results.

“When we discover a group of pages with duplicate content, Google uses algorithms to select one representative URL for that content. A group of pages may contain URLs from the same site or from different sites.”

They note that when they choose a representative URL from a different domain, they call this “cross-domain URL selection”.

In cases where multiple URLs contain the same content (for instance, due to infrastructure configuration, optional parameters, syndication, or internationalization), many options exist for site owners to indicate to Google which version is canonical.

However, in some cases, the site owner doesn’t use these options to specify a preferred version or Google may select a different version than the site owner specifies.

This new feature alerts site owners  when their “algorithms select an external URL instead of one from their website”. They say common reasons for this include:

  • Site owner-specified – if you’ve moved your domain or have implemented the rel=canonical attribute to indicate that a page on another domain is canonical, then this alert is simply confirmation that Google is indexing as you’ve specified.
  • Regional sites – if you have the same content on multiple regional sites (for instance, the same English content on a .com (for US), a .co.uk, and a .com.au), Google may cluster pages with identical content across sites and use relevance signals to determine which to display per query.
  • Incorrect canonicalization – in this case, a page may inadvertently use the rel=canonical attribute to specify a page on a different domain as canonical.
  • Misconfigured server – a hosting misconfiguration (this in particular happens sometimes with shared hosting) may cause a two different domains to display the same content)
  • Hacked site – sites are sometimes hacked to point to other domains.
  • Scraped content – the blog says that “in rare situations”, Google may select a URL from a site that has scraped your content.
This alert is available within the message center, so you’ll only see it if your site has this issue and Google is currently only reporting on the URLs from the Top Pages report. This is feature is great insight for site owners who otherwise would have no idea why a particular page doesn’t appear in search results. I’ll be posting a follow up shortly with more details on some of these scenarios and what you can do if you get an alert.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Vanessa Fox
Contributor
Vanessa Fox is a Contributing Editor at Search Engine Land. She built Google Webmaster Central and went on to found software and consulting company Nine By Blue and create Blueprint Search Analytics< which she later sold. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a foundation for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Get the must-read newsletter for search marketers.