• http://seowebmaster.com/ ★ ★ Search Engines WEB ★ ★

    Patent of no patent, Google had already aggressively phrasing in their de-duplication Algos as of about two years ago.

    Most SEOs noticed it when the many directories using DMOZ data began disappearing, and Websites that owned much of their link popularity success to their DMOZ listings – began a sudden, sharp drop in the SERPs (some even virtually disappeared)

    The second phrase of Google’s de-duplication process appeared to drop or severly punish OBVIOUS links pages, and dropping links directories that were OBVIOUS duplications of automatic link uploading pages.

    Also, around that time, certain high profile automatic link exchange Web sites were banned.

    One very high profile one actually made a public acknowlegement to their customers and eventually changed their domain – thus starting all over because even after changing their strategies – stayed permanantly at a PR0.

  • http://www.aaronshear.com/blog/ Aaron Shear

    This patent seems to sound like the shingles conversation that was started a few months back. Relating to how a Shingle or in this case a Sketch can be viewed as similar or duplicative. Interesting concept makes it very difficult to scrape and succeed any longer.

    Great write up!

  • http://www.adscriptor.com Jean-Marie Le Ray

    Hi Bill,

    Nice post, as usual. Anyway, even if I know you’ll have to translate my post, what do you think a similarity engine could do in this case :
    and how to find the original author ?

  • http://www.seobythesea.com Bill Slawski

    Hi Jean-Marie,

    Thanks. If I understand correctly, your question is more about which page might appear in search results when a search engine has determined that pages are duplicates or are very similar.

    The best description of how a search engine might behave when filtering out pages to be shown to a searcher is in a patent application from Microsoft – System and method for optimizing search results through equivalent results collapsing.

    I wrote about it at SEO by the Sea, and I’m not going to duplicate that here, so I’ll just point to it – Microsoft Explains Duplicate Content Results Filtering. Chances are very good that what Microsoft describes there is very similar to what Google and Yahoo are doing when deciding which pages to show.

    I don’t think that it makes a difference whether Google is using a similarity engine, as described in this patent, or one of the shingles methods from their other patents, or a phrase-based indexing method to identify duplicates, or some other method. Regardless of what method is being used to identify duplicates, the decision of which pages to show is likely independent of that.

  • http://www.adscriptor.com Jean-Marie Le Ray

    Hi Bill,

    “If I understand correctly, your question is more about which page might appear in search results when a search engine has determined that pages are duplicates or are very similar.”

    Yes and no. In this case, maybe it’s more about plagiarism than duplicate content, and I guess no similarity engine nor algorithm will be able to determine who is the original author (so the one and only result the search engine should show in SERPs), but a human validator.
    I think just this solution would be trully healthy for the Web ecosystem.


  • http://www.adscriptor.com Jean-Marie Le Ray

    Bill, hi again

    a bit off-topic, but did you read that : http://www.ificlaims.com/press_release012007a.htm
    I’ve seen somewhere than Microsoft is not enough innovative, it doesn’t seem! 1463 patents in 2006, rank 12

  • http://www.seobythesea.com Bill Slawski

    Searching through the granted patents and published patent applications every week, I do see a lot of patent filings from Microsoft.

    Some of them are innovative, and some of them maybe less so. I’m not sure that volume of patent filings by itself is a clear indication of innovation.

    But there is some interesting stuff amongst those patent applications.