• http://feefighters.com Sheel Mohnot

    I like what they are doing in theory – it makes a lot of sense, but Blekko has a LOT of issues… like surfacing a domain that we got rid of 7 months ago, and showing sites that have nothing to do with the intended search (apart from a similar domain name).

    Details: http://feefighters.com/blog/?p=4554

  • Richard Stokes

    \That’s interesting, this idea of blocking spam pages before they are added to a search index. It may have been done before, but if so, I don’t recall by which service. Certainly, it was never something noteworthy enough for me to recall. If you keep the spam out entirely, potentially that makes for cleaner results.\


    This was a head scratched to me as well. We’ve been using similar algorithms for detecting spam sites for about 18 months now, but it’s a well documented fact that the technology goes back much further. Researchers from Microsoft, Yahoo, and Google have written about various aspects of web spam detection extensively since 2005 and the basics go back even further to 1999 (Andrei Broder, et al covered the basics even before then). It’s a bold (and false) claim.

    What I will buy is that they have their own tweaks. Everyone does.

    Sorry I’m missing you at SMX! I’m currently heading to Maui to talk on advanced webspam detection (and other topics) at Perry Marshall’s annual conference. Would love to connect sometime.

    Richard Stokes

  • http://www.michael-martinez.com/ Michael Martinez

    You must not have gotten the same Google results I did for your example query. I don’t think it really matters how much fluff is filtered out of the search engine. More seems to float to the top because, frankly, most sites that publish serious information have earned relatively few links and don’t use on-page optimization.

    The problem that even Google has masterfully failed to solve is how to surface truly useful, authoritative content. Links won’t help you do that. On-page optimization won’t help you do that.

    We need better indexing and analysis technologies that can dig past all the popularity factors and get to the meat of what people are really searching for. That’s at least a few search generations away.