• http://thesecularity.com/ T.S.

    The suspense is killing me! OMG…

  • http://xavvy.com Gordon Mohr

    Google has probably already considered these techniques, but two other technical mechanisms that could be used by expert publishers (or baked into authoring tools) when scraping is a known/ongoing threat:

    • embargo content until the moment after Googlebot first visits it. Thus, the order of crawling will always reflect the order of creation. This would need Google’s acceptance to not appear like cloaking.

    • introduce a service for obtaining secure digital timestamps of content blocks by content fingerprint/shingleprint. This service could be by Google or a third party; its output could be automatically embedded as a microformat at the moment of ‘publishing’. In the event of disputes (or later manual reports of problems) this info could prove definitive for proving true order-of-creation.

  • http://www.bluesapphirecreations.com ankurchaudhary

    Or maybe Google could rely on websites like Copyscape for putting in the time stamp on the content so as to ensure scraped content gets a later or no time stamp at all. However, ensuring that the timestamp procedure is well known among web masters could be a task (unless you roll out something similar through Google Webmaster tools.

  • P.G.

    It is a fact that Google search quality had deteriorated sharply in recent years.

    Google has bounced back sharpy from that.

    And now this initiative to engage users/publishers. Great stuff!

  • Winooski

    Gordon, I’m loving the idea of a neutral 3rd-party authenticating the first time of publication. It could potentially help in copyright complaints as well, let alone issues of scraped content competing in the SERPs.

  • http://corp.lawgical.com Trent Carlyle

    Hi Matt. Do you think Google is only looking for those that copy content verbatim? Sometimes we’ll summarize an article (and rewrite the title), then cite/ link to the original source. Seems like a reasonable practice, but we want to be sensitive to the recent and pending changes. Our posts do seem to rank well.

  • Matt McGee

    Trent – I obviously can’t speak for Google, but I’d say this: If the ONLY thing you do is summarize other people’s articles and link to the original, you might be risking the “scraper” label. If you actually write something original ABOUT the other article in the process of linking to it, that’s probably not a bad sign. And if you also have plenty of your own high-quality, original content being published alongside these shorter pieces that link to other content, that’s even better.

    Does that help?

  • http://jury.google.com/ L.S.

    meh another help us plee like spam reports for nubs

  • http://europeforvisitors.com Durant Imboden

    Trent, this thread from Google’s Webmaster Central help forum covers that very topic. It was started by a guy whose network of sites got penalized for using rewritten content from other sources:


  • Kelly

    Good, Goood news.

  • http://www.hantohat.com Hudson

    I think it’s important to protect the some re syndication models (eg: excerpting & excerpt + commentary models) from discrimination as they are very valid and enjoyed content delivery methods.

    Although, as a provider of automation solutions, I do not think it’s ever fair for a non-owner to outrank an owner for 100% purely duplicated content, or even excerpted content. So safeguards like the ones mentioned in the first couple of comments might be a good idea. But power should not be broken off into 3rd party systems like copyscape, imo, and should instead remain within Google.

    Maybe Google Plus will provide the missing link for tracking and honoring content SERP priorities.

  • Steve Blade

    Very good comments, and Hudson I feel you summed it up nicely. Much better idea to have it within Google Webmaster tools. Thanks Matt for the article!

  • http://TheAverageGenius.net JamestheJust

    All fine and good – and I mean it…but what about when Google is the scraper? Who watches the Watchers?