Google Signals Upcoming Algorithm Change, Asks For Help With Scraper Sites

Google is calling for help in identifying a long-running problem: scraper sites in its search results — and particularly scraper sites that are ranking higher than the original page. Matt Cutts, the head of Google’s spam fighting group, put out the call for help on Twitter this morning:

Scrapers getting you down? Tell us about blog scrapers you see: http://goo.gl/S2hIh We need datapoints for testing.

The link leads to a Google Doc form that asks for the exact query where there’s a “scraping problem,” along with the exact URLs of the original and scraper pages. The form explains that Google “may use data you submit to test and improve our algorithms.”

It’s not entirely unusual for Google to call for help like this, but it’s noteworthy because the issue of scraper sites has been particularly prominent in recent months.

Google vs. Scraper Sites

Google has always had critics but, within the last year, many of them grew more vocal about what they perceived as a decline in quality of Google’s search results. Google responded early this year with a blog post in which it disagreed that search results had grown worse, but also promised “new efforts” to fight spam. Among other types of spam, that post called out scraper sites:

And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content.

Cutts then announced the algorithm change on his own blog a week later, saying that “slightly over 2% of queries change in some way” after the update. He added that “searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site’s content.”

Panda Update vs. Scraper Sites

But the scraper problem didn’t go away. In fact, after the Panda update rolled out in February, many webmasters flooded Google’s help forums with reports that it had gotten worse.

A few months later, during our SMX Advanced conference in Seattle, Cutts confirmed that the newest Panda update would target scraper sites. That update — Panda 2.2 if you’re scoring at home — rolled out in mid-June.

And that pretty much brings us up to today — where Google is “testing algorithmic changes for scraper sites (especially blog scrapers),” and apparently looking for some examples that it thinks it may have missed. If nothing else, it’s a chance for everyone who’s been so vocal about the scraper site problem to report exactly what they’re seeing in Google’s search results.

Related Topics: Channel: SEO | Google: SEO | Google: Web Search | SEO: Spamming | Top News

Sponsored


About The Author: is Editor-In-Chief of Search Engine Land. His news career includes time spent in TV, radio, and print journalism. His web career continues to include a small number of SEO and social media consulting clients, as well as regular speaking engagements at marketing events around the U.S. He recently launched a site dedicated to Google Glass called Glass Almanac and also blogs at Small Business Search Marketing. Matt can be found on Twitter at @MattMcGee and/or on Google Plus. You can read Matt's disclosures on his personal blog.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://thesecularity.com/ T.S.

    The suspense is killing me! OMG…

  • http://xavvy.com Gordon Mohr

    Google has probably already considered these techniques, but two other technical mechanisms that could be used by expert publishers (or baked into authoring tools) when scraping is a known/ongoing threat:

    • embargo content until the moment after Googlebot first visits it. Thus, the order of crawling will always reflect the order of creation. This would need Google’s acceptance to not appear like cloaking.

    • introduce a service for obtaining secure digital timestamps of content blocks by content fingerprint/shingleprint. This service could be by Google or a third party; its output could be automatically embedded as a microformat at the moment of ‘publishing’. In the event of disputes (or later manual reports of problems) this info could prove definitive for proving true order-of-creation.

  • http://www.bluesapphirecreations.com ankurchaudhary

    Or maybe Google could rely on websites like Copyscape for putting in the time stamp on the content so as to ensure scraped content gets a later or no time stamp at all. However, ensuring that the timestamp procedure is well known among web masters could be a task (unless you roll out something similar through Google Webmaster tools.

  • P.G.

    It is a fact that Google search quality had deteriorated sharply in recent years.

    Google has bounced back sharpy from that.

    And now this initiative to engage users/publishers. Great stuff!

  • Winooski

    Gordon, I’m loving the idea of a neutral 3rd-party authenticating the first time of publication. It could potentially help in copyright complaints as well, let alone issues of scraped content competing in the SERPs.

  • http://corp.lawgical.com Trent Carlyle

    Hi Matt. Do you think Google is only looking for those that copy content verbatim? Sometimes we’ll summarize an article (and rewrite the title), then cite/ link to the original source. Seems like a reasonable practice, but we want to be sensitive to the recent and pending changes. Our posts do seem to rank well.

  • Matt McGee

    Trent – I obviously can’t speak for Google, but I’d say this: If the ONLY thing you do is summarize other people’s articles and link to the original, you might be risking the “scraper” label. If you actually write something original ABOUT the other article in the process of linking to it, that’s probably not a bad sign. And if you also have plenty of your own high-quality, original content being published alongside these shorter pieces that link to other content, that’s even better.

    Does that help?

  • http://jury.google.com/ L.S.

    meh another help us plee like spam reports for nubs

  • http://europeforvisitors.com Durant Imboden

    Trent, this thread from Google’s Webmaster Central help forum covers that very topic. It was started by a guy whose network of sites got penalized for using rewritten content from other sources:

    http://www.google.com/support/forum/p/Webmasters/thread?tid=21e50ed1333526fc&hl=en

  • Kelly

    Good, Goood news.

  • http://www.hantohat.com Hudson

    I think it’s important to protect the some re syndication models (eg: excerpting & excerpt + commentary models) from discrimination as they are very valid and enjoyed content delivery methods.

    Although, as a provider of automation solutions, I do not think it’s ever fair for a non-owner to outrank an owner for 100% purely duplicated content, or even excerpted content. So safeguards like the ones mentioned in the first couple of comments might be a good idea. But power should not be broken off into 3rd party systems like copyscape, imo, and should instead remain within Google.

    Maybe Google Plus will provide the missing link for tracking and honoring content SERP priorities.

  • Steve Blade

    Very good comments, and Hudson I feel you summed it up nicely. Much better idea to have it within Google Webmaster tools. Thanks Matt for the article!

  • http://TheAverageGenius.net JamestheJust

    All fine and good – and I mean it…but what about when Google is the scraper? Who watches the Watchers?

    http://www.seobook.com/the-doors

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide