Every hour, one million spam pages are created. That’s a stat that start-up search engine Blekko has now put out — complete with a new “Spam Clock” showing a count-up of spam pages created since the first of the year.
Currently, the Spam Clock estimates that there’s been about 155 million spam pages made since January 1. Blekko CEO Rick Skrenta talks more about the clock on his personal blog here.
Don’t confuse that figure with the total amount of spam pages on the web. That figure is probably in the billions.
How Bad Is Spam? And Is It Killing Google?
Is spam a big problem? Sure — spam can certainly make it harder for any search engine, including Blekko, to present the best results. But is spam in particular killing Google? That’s the impression you might get if you’ve been reading some of the posts that have been making the rounds of the technology circles recently. It’s one reason why I think Blekko also put up its spam clock — to help keep pressure on this issue in general and Google in particular.
It’s sort of a New Year’s tradition now. Last year — in December 2009 — Paul Kedrosky wrote about the difficulties in finding good information about dishwashers in advance of a purchase on Google.
This year, we’ve had another round — and Kedrosky’s article is often lumped in with them, despite being a year old. What’s the New Year given us?
Why We Desperately Need a New (and Better) Google from guest author Vivek Wadhwa on TechCrunch covers how apparently only Blekko’s date sorting could handle certain types of research.
Trouble In the House of Google from Jeff Atwood of Coding Horror covers how scraper sites were apparently making it harder for Google to surface Atwood’s own content.
Google’s decreasingly useful, spam-filled web search from Marco Arment of Instapaper talks about the problem of Q&A sites flooding Google with horrible content.
Sure, Google Has Problems
Date sorting IS an issue at Google (see Up Close With Google Search Options), though I’m pretty sure Bing and Blekko might have similar issues. Eventually, I’ll get back to revisit this.
Scraped content as Atwood describes is definitely a problem and especially irritating when you understand that Google earns off of that. The Google Sewage Factory, In Action: The Chocomize Story that I wrote last July has more about this:
On the other hand, it’s clear how much garbage that Google has caused to be generated, simply by publishing the trends. But that garbage wouldn’t happen, if it didn’t know it was going to be rewarded. It is, both with traffic from Google and from revenue from Google for those carrying its ads.
The Q&A sites are a real problem, and I’ve been compiling examples myself for a future article about how often these annoyingly get ranked well purporting to offer answers but actually fail to do so.
Moreover, I’ve been increasingly concerned that Google’s results simply don’t seem up to the standards that people might expect. The articles below go into more depth about this:
- Reviewing Some Bad Google Search Results With Sergey Brin, October 2009
- How The “Focus On First” Helps Hide Google’s Relevancy Problems, September 2010
- Google’s “Gold Standard” Search Results Take Big Hit In New York Times Story, November 2010
But No One Really Knows If Relevancy Is Down
But here’s the thing. I don’t know that Google’s relevancy has actually decreased. Nor does anyone above who has posted articles recently. We have feelings about this, but these feelings don’t take into account a number of other factors:
- We expect more from Google than we do in the past, searching for things we might not have in previous years
- We don’t remember all the successful searches, focusing on when things go bad.
- We probably don’t do a comparison check on Bing or Blekko to see if they performed better, nor do we use those services on a regular basis to understand if they’re also “failing” to the degree we might feel Google does.
- Our expectations of Google are higher.
Highs & Lows
Expectations are especially important. Over the years, I’ve seen Google heralded as even being godlike for its ability to find information, despite the fact that search engines before Google actually often worked well and those after it also worked well and sometimes outperformed it.
There was a press love affair when Google first came out. There continues to be a consumer love affair, in my mind, that the Google brand on search results can make them seem better. There have been several studies in the past where just putting the Google logo on someone else’s results will make a consumer think the results are superior.
I think we’re finally seeing this slip back on Google. Just as its achievements were inflated into super-greatness, now its results are blown-up into huge failures. The reality is that millions of people do millions of successful searches on Google each day. If there was a big problem, it would be losing share massively. It’s not. Also see Andrew Goodman’s take on this, Search Isn’t Broken Because One Guy Had Trouble Using Google.
Speaking Of Relevancy…
Meanwhile, here’s a quick taste of Blekko, for a search on locksmith in orange county:
Looking at those results, and having done this type of search in the past, I already know what to expect. A bunch of companies not really based in Orange County, California but rather referral services. And the first result seems to deliver that:
That’s not necessarily spam. This company probably will get me to a locksmith in Orange County. But it’s a page created specifically to hopefully win in the search results. All those hyphens in the domain name are a dead giveaway. It’s not a “real” business — and Blekko is rewarding it. So’s Google, by the way — same number one spot. At Bing, it ranks number five – some other referral service gets the number one spot.
Time For Relevancy Metrics?
So, a little perspective. I do think Google needs to improve. I would like to see Google, Bing and perhaps Blekko back a third-party independent group to do regular, industry-accepted relevancy ratings so that we’re getting past the “I think things are worse” perception to really knowing if they are. It’s something I’ve pushed for since 2002. I’ll have more to say about that in a future post, perhaps reviving the idea.
More About Search Engine Spam
In the meantime, if you want to understand more about search engine spam, see our What Is Search Engine Spam? The Video Edition post. And at our upcoming SMX West search marketing conference in San Jose, we’re taking another look at this in our The Spam Police session.
Postscript from Matt McGee: Pure speculation here: I can’t help but wonder if this is part of collaborative effort with another small search engine, DuckDuckGo, to hit Google on areas where it may be seen as vulnerable.
Just a few days ago, DuckDuckGo launched DontTrack.us, a direct challenge to Google on privacy/tracking issues. Now, Blekko launches SpamClock.com, an indirect challenge to Google on spam. And Blekko and DuckDuckGo have been formal partners for at least a few months. Is it coincidence that the two would launch these projects just days apart?