Last month, I shared a case study of a client I’m currently working with on a duplicate content issue. It turns out that this particular site had significantly lost rankings over the past year because of other sites “lifting” their content, often verbatim, causing the client’s site to lose rankings through Panda updates.
This is a common problem for content producers. You produce great content only to have other sites disregard copyright and repurpose the content on their own sites — without attribution. I see this often in the case of associations and non-profits that may be producing valuable research or information that other sites may want to share. In some cases, the infringement may be unintentional… but in many cases, it isn’t.
So, how can you recover when your content is stolen? First, understand that this isn’t a quick fix or fast process. However, it’s the process that I find works best with Google.
Step 1: Find The Infringements
If you believe you’ve been hit by a Panda update and are seeing dramatic traffic losses on certain pages, I would prioritize looking for duplicate content that corresponds to those specific page losses.
To get started, copy a few lines of content from the page on your website and search for that content as an exact match search in Google by putting it in quotes. If you find pages other than your own that appear with this exact match content, it’s time to find out exactly how much of the content on these pages matches yours. You can also use Copyscape’s Web search to see if it readily identifies other copies of your content on the Web.
Copy the URL from your page and the URL from the suspect page and paste them into Copyscape’s Content Comparison tool. This tool looks at both pages of content, side-by-side, and indicates the percentage of overlap in the two pages’ content. My rule of thumb is that anything over 50% really does need to be addressed immediately. However, you’d be surprised how often we see 90-100% duplicate content.
Step 2: Log The Infringements
As you find duplicate copies, log the information in a spreadsheet, including the percentage of overlap. If you find a site that seems to have copied any of your content nearly verbatim, focus in on these sites and see what else you can find on them. I’ve typically found that sites which duplicate one page of your site don’t stop with just one page. Log all of the pages you can find from the sites that have duplicated your site content.
Also check the Wayback Machine to see how long the infringing site has been using the copyrighted content. Go back as far as you can to get a full understanding and log of information about this infringement (you may need the information later).
Step 3: Reach Out To The Infringing Site Owners
Next, you’ll want to reach out to the infringing site owner(s). I generally start with a friendly email alerting the site owner about the infringement and politely asking the site owner to take the pages down. I also request that the site owner respond to the email, letting me know that the content is down, by a certain date. List out all of the pages on the owner’s site that are in violation and that you would like removed.
How can you find out who owns a site? If the site itself does not provide contact information, check out who owns the domain through the WhoIs lookup. The site owner may have his/her contact information hidden; but if not, you can see the individual to contact and an email and mailing address.
Generally, the email is enough to get the owner to take the infringing content down. However, if it’s not, you may want to send a more strongly-worded letter via postal mail. In this case, it may also be helpful to have the services of an attorney who can send a legal letter on your behalf.
Step 4: If the Pages Are Not Removed
After all of your efforts, if the site pages are not removed and you cannot resolve the conflict with the site owner, it’s time to hard ball.
Copyright infringement on the Web is a violation of the Digital Millennium Copyright Act (DMCA) (pdf). Google and Bing will take down content that is in violation of DMCA, but they do request that you attempt to contact the site owner to resolve the issue first. The forms you’ll need to fill out are:
The Wrinkle Of Content Syndication
If you’re syndicating content or giving another site permission to copy your content, it can be tricky. If the content is completely duplicated, you (and the site using the content) risk being affected by Panda updates. Sometimes, too, Google misinterprets the original content creator or doesn’t rank the preferred version highest. Google’s advice on syndicated content is:
Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content.
Protecting Your Content Long Term
While I know that all of this may seem daunting, ultimately it is the responsibility of the copyright owner to protect his/her copyrighted material. One of the first ways you can do this is by making your copyright very clear — add the copyright icon and year to each page of your website. I also prefer to see the beginning year to the present year following the copyright so that it is very clear when the copyright was established.
Another tool you can use to catch duplicate pages before they cause you Panda issues is the CopySentry tool from CopyScape. This tool, for a small fee, will continuously monitor certain pages you identify on your site and notify you when duplicates are found. If you have pages that tend to be more popular or have been duplicated in the past, I’d prioritize these pages for monitoring.
All in all, the process takes time. Time to research the infringing pages, time to document, time to contact site owners, time to report to search engines and time to see recovery (even when infringing pages have been removed). It can be frustrating, but it’s a necessary process to protect your content and keep your organic rankings strong.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.