Are Manual Solutions The Answer To Content Farms?

It was interesting to see some of the recent reactions when upstart Blekko decided to toss some sites out of their index. For the uninitiated it was a bit of a seeming PR play against Google whom have been getting smacked about for thin quality of late. If you hadn’t guessed by now, we’re talking about (Demand Media’s) eHow and the other “top 20 spam sites” that were nuked.

Of course the question remains, why? It certainly does seem like a knee jerk reaction that almost panders to the search community. Sure, I dislike running into weak content in the SERPs as much as the next guy. But I am pretty sure that there are many other equally thin content in many cases, much worse than what they’re churning out. Seriously? There’s only 20 sites worth tossing?

The State Of Modern Search

All is not lost my friends. One of the better developments over the last few years is all of the new (potential) signals and and infrastructure to deal with them. To a certain extent there is every chance for Google (and other engines) to get past the link.

Why now, more so than in the past? The infrastructure (caffeine) and the motivation (growing quality grumblings). Let’s consider some areas that might make sense, while also helping to combat spam and low quality results.

Personalization. One of the longest running goals at Google is deeper personalized search. Add to that the world of mobile, another personalization and area of great interest, we might see far more personalized results in the near future. If the new infrastructure enables a more granular personalization than is currently in place, this can give new signals that can lessen the spam we see on web today.

Explicit User Feedback. When a user takes an action to tell the search engine something, it is a type of relevance feedback known as ‘explicit feedback’. Think of (now defunct) search wiki as a good example. Others might include emailing a page, saving to favorites and so forth. Traditionally, this type of data has been hard for search engines to come by. The noisier implicit feedback, is far more readily available. But, it would certainly be a great way to deal with spam or unwanted domains in a personalized search environment.

Temporal Data. Another area that has been prevalent with Google over the last few years is freshness (in many query spaces). There may be some of these types of signals that could increase relevancy while dealing with spam. Even for the link graph, stronger weighting of these might help decrease the power of authority in many situations.

Social Graph. Another obvious area is of course, social. The social graph and real-time search are two areas Google is also vested in over the last while. This can lead to deeper personalization as well as other potential signals. Once more, a lot of social signals are open to spam unless they’re used in a granular personalization approach. But in concert with the other elements mentioned here, it seems that it would also help root out weak content/sites while not opening the entire link graph up for manipulation.

At the end of the day, there needs to be an automated solution that is protecting not only against spam but those that wish to do their competitors harm. Having ‘votes’ of spam should only affect the individual user. You can’t spam yourself and removing a competitor only from your results, is the kind of personalization that would work to deal with this.

Some Thoughts From The Geeks

To try and gain some more insight into this and the larger considerations of user feedback, I contacted Rich Skrenta from Blekko and Mark Cramer of Surf Canyon (awesome tool, awesome geek).

On the topic of explicit feedback mechanisms such as we’ve seen with Google Search Wiki, Rich says they didn’t work because, “there are too many possible queries, effectively an infinite set. How many different queries are there are all possible song lyrics?”.

Skrenta then made the case for their approach:

What we are doing is identifying the top sites per category. The top 100 /health sites collectively have millions of pages and can answer any medical question you have.  The top 50 lyrics sites have lyrics for every song”.

That makes some sense, but I am also leery of ‘human powered’ solutions, which was countered by Rich whom contends that it’s:  “(..) disingenuous to pretend that “the algorithm” drives the results. The algorithm gets changed on a day to day basis in response to new material appearing on the web.

Ok, yes, there are people constantly messing with the algorithms at Google, which does mean they’re also making subjective statements of their own. Also, for the unfamiliar, Google does have raters in the system scoring on perceived relevance as part of search quality testing.

Mark Cramer for his part as someone familiar with user feedback mechanisms, feels that, “the implicit feedback approach is always the best. In most all cases, people are not interested in providing explicit feedback.

With Google SearchWiki being the glaring example. Surf Canyon did get involved with the move by making it an option for their users, “we figured it wouldn’t hurt to throw it in there” said Cramer, referring to a new option for the application.

In further clarification on Blekko’s approach, which is a more subjective stance, Skrenta once more uses the lyric SERP example:

Rather than rolling the dice with a 200-weight algorithm that’s been trained by a bunch of minimum wage web contractors, you could actually just pick the top lyrics sites.  They have the lyrics to every song every published.  And they won’t download malware or spyware onto your computer.

This is once more a seemingly logical approach, but I can’t see it being something a search engine such as Google would consider. It does speak more to a more personalized environment such as we looked at earlier.

As of this week, even Google is getting back into the explicit user feedback experience with a Chrome add-on for removing sites from your results. Will this fair any better than previous attempts? It is highly unlikely. Forgetting for a moment the market share for Chrome, users simply aren’t that interested. Just give them good results to start with.

Dear Blekko

While we can give kudos to the gang at Blekko for trying to say something on the need for higher quality search results, there are limits. This doesn’t scale well and would be a PR nightmare for any major search engine. Does e-How or Mahalo really have the worst result for everything it publishes? It seems a slippery slope to venture onto.

Where will it end and what safeguards are in place?

Until it can be proven in some larger implementations that users will not only engage with explicit feedback but do it honestly, I don’t believe arbitrary, non-algorithmic, actions are the answer. It’s certainly not the answer for Google, I know that much.

One thing is certain; producing high quality relevant results ain’t easy.

Finding An Algorithmic Solution

So let us consider; what if the shoe was on the other foot?

Imagine that Google had made such a move. It most certainly wouldn’t be hailed; in fact, I am pretty sure people would be screaming from the mountain tops that Google was biased, that they were the Internet judge and jury, on and on. I guarantee you that much.

This is one of the reasons that Google (and many other search engineers) tend to prefer to develop an algorithmic solution to the problem. One of the other obvious reasons is that constantly updating the index from a subjective strategy would be massively resource intensive and cause far more grief than poor results and search neutrality have of late.

This approach is not the answer.

What needs to be done is to find better filters and dampeners which can help limit the positive effects on low quality results. Now this isn’t even close to being as easy as it sounds.

One element that is certainly a potential block is authority. Often, these types of sites have the link equity, age and trust that makes ranking for many a long tail terms, fairly easy. If you’ve ever worked on a strong (authority) domain, you know what I mean. But what if that dampening has an effect on the authority of your site?

See? Not so easy, is it? There will always be winners and losers when the goal posts are moved. You may be one of the losers. Be careful what you ask for.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Blekko | Channel: SEO | Google: Web Search

Sponsored


About The Author: is an avid search geek with a penchant for all things IR (information retrieval) related. He's best known for his writings along the same lines; technical aspects of SEO. He is an SEO consultant (to the stars?) that runs the SEO Training Dojo and can also be found on his personal blog, the Fire Horse Trail.

Connect with the author via: Email



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.bluesapphirecreations.com ankurchaudhary

    Great article David! A question though – what if Google also identifies the top offenders (for the want of a better word) for certain search categories and reduce the domain or authority impact for pages coming from them? I am sure the algorithmic changes somehow try to take care of this but wouldn’t having a small (thence manageable) index of sites like eHow and Mahalo help the search engine provide better results. That said, the question of web neutrality is again daunting.

    On second thoughts, isn’t almost every objective parameter grounded in some heuristic subjectivity?

  • dstiehr

    Getting rid of the “top 20 spam sites” doesn’t fix the problems by a long shot. Search “dishwasher repair” on Blekko, and the very first result that comes up is http://www.dishasher-repair.org – a link farm!

    Try it for just “dishawasher” and you get http://www.dishwasherpete.org – a parked domain! Other results include a Wikipedia listing (in case I’m confused on what a dishwasher is, I guess), a howstuffworks article (shouldn’t this be lumped in with eHow?), food recipes, a recall article, something from a video game site, and, well I’m going to stop typing because you can run the query yourself.

    I took the dishwasher theme from the famous Paul Bedrosky article about how poor Google’s results were when he needed a new dishwasher, but I’d imagine this could be replicated across many different search themes. What is manually-curated search doing to help someone on these two very common queries?

    How are those results any better than eHow and the other Spam sites that have been removed? And if Blekko wants to say these results are a work in progress, then they should stop the PR attacks until they have their own house in order.

  • http://www.pbm.com/~lindahl/ Greg Lindahl

    @dstiehr Thanks for pointing out those spam sites. Did you try the suggested slashtag /diy? What do you think of those results?

  • http://seotrainingdojo.com David Harry

    @ankurchaudhary – I think of how one decides who ‘top offenders’ are. In the web spam world these are generally the more blantant offenders; cloaking, dodgy link building, automation etc. When it comes to content, however weak, it’s more about search quality. There are lots of sites out there with weak content, but we’ve never called that spam. Just a bad result. To weaken (legitimate?) en entire domains authority because of a subjective judgement that ALL their content is weak, seems troublesome. And yes, lol, as Rich pointed out, there is going some be some subjectivity regardless.

    @dstiehr – once more, this is what I feel is the never ending problem. Instead of dealing with relevancy isses, we keep seeing band-aid solutions to the problem. Furthering this problem is the difficulty in getting users to interact or give explicit signals in general.

    I spent some time looking around some of these ‘content farms’ and not all the information is useless. It’s jutst apparent that the collective domain authority is increasing scoring for pages that don’t likely warrant the ranking from a relevance perspective.

    @Greg – nice of U to drop in. I’d love to talk sometime about implicit/explicit signals. I love what you guys are trying to do, but it seems a shift in searching behaviour might be a problem (large scale). Hopefully everyone (search engines) can work towards better relevance.

  • http://www.bluesapphirecreations.com ankurchaudhary

    @David: I completely agree with your assertion but wouldn’t the threat that their impact/authority in the search results might be reduced if they have weak content quality coerce websites like eHow and Mahalo to work harder on quality management :-)

    May be just a bluff by Google would clean the web in unprecedented ways ;-)

  • http://www.pbm.com/~lindahl/ Greg Lindahl

    @David: Sure, I’d love to talk about implicit/explicit signals – a very important topic in search these days!

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide