Yes, Bing Has Human Search Quality Raters & Here’s How They Judge Web Pages

bing-logoA web page that definitively satisfies a searcher’s intent is “Perfect,” and should appear at the top of Bing’s search results. On the other end of the scale, spammy web pages and pages that almost no searcher would find useful are deemed “Bad.”

That’s a bit of how Bing instructs the people in its Human Relevance System (HRS) project to grade web pages. It’s explained in a 52-page document that Bing calls the “HRS Judging Guidelines.”

The HRS project is similar to the Quality Rater program that Google uses. Microsoft’s version has been around in some form since shortly after MSN Search began generating its own search results in late 2004. Like Google, Microsoft uses testing services (like Lionbridge and others) to hire human search evaluators and administer the program. (Microsoft often refers to the evaluators as “judges,” and I’ll do the same in this article.)

Very little, if anything, has been written about Microsoft’s HRS project, and the company’s communications team was understandably reluctant to discuss it with Search Engine Land when we contacted them recently. But, when we shared a copy of the guidelines document that was given to us by a former judge, a Bing spokesperson did confirm that it’s the current version of the HRS guidelines. The document is dated March 15, 2012.

What’s inside? How does Bing ask its human search quality judges to grade web pages? Read on for details.

Searcher Intent & Landing Pages

The document goes into detail about the three primary query intents (Navigational, Informational and Transactional) and offers suggestions for how to determine user intent based on the search query. Human judges are instructed to consider these four questions related to intent when they judge landing pages (the “LP” referred to below):

1. Intent: Does the LP content address a possible intent for the query? 2. Scope: Does the range and depth of the LP content match what the user wants? 3. Authority: Is the trustworthiness of the content on the LP appropriate to the expectations of the user? 4. Quality: Do the appearance and organization of the LP providing a satisfying experience?

Ultimately, judges are told to identify if a landing page satisfies searcher intent on a scale from “strongly” to “poorly,” with additional categories for obscene and inaccessible content.

The guidelines document explains that “A strongly satisfying page will closely match the user’s intent and requirements in scope and authority, while a poorly satisfying result will be useful to almost no users.”

The Rating Matrix

The HRS Judging Guidelines asks judges to rely on a Rating Matrix to grade web documents. The matrix combines A) likely searcher intent with B) how well the document satisfies that intent. A document that “strongly” satisfies the “most likely” intent is graded Excellent/Perfect, while a document that “poorly” matches the most likely intent is graded Bad.

bing-rating-matrix

Rating Options

The five rating options that judges can use are shown in the matrix above, but the guidelines offer a more detailed explanation. This is really the heart of the document — the section that reveals what Bing looks for in grading (and likely in ranking) web pages/documents.

Here’s how Bing explains the five possible ratings:

1.) Perfect

“The LP is the definitive or official page that strongly satisfies the most likely intent.

The document says that a Perfect landing page “should appear as the top search result.” It also says that only one landing page will typically deserve a Perfect rating, but for some generic queries (such as “loans” or “insurance”) there will not be a Perfect landing page. A Perfect page should address the intent of at least 50 percent of searchers.

2.) Excellent

Bing describes this as a landing page that “strongly satisfies a very likely or most likely intent” and “closely matches the requirements of the query in scope, freshness, authority, market and language.” Users finding an Excellent landing page “could end their search here and move on.” An Excellent page should address the intent of at least 25 percent of searchers.

An example in the document is that Barnes & Noble’s home page is an “Excellent” result for the search query “buy books.”

3.) Good

A Good landing page “moderately satisfies a very likely or most likely intent, or strongly satisfies a likely intent.” Bing says most searchers wouldn’t be completely satisfied with one of these pages and would continue searching. A Good page should address the intent of at least 10 percent of searchers.

4.) Fair

This rating applies to pages that are only useful to some searchers. A Fair page “weakly satisfies a very likely or most likely intent, moderately satisfies a likely intent, or strongly satisfies an unlikely intent.” A Fair page addresses the intent of at least one percent of searchers.

5.) Bad

In addition to being useful to almost no one and not satisfying user intent, this rating applies to a web page that “uses spam techniques” or “misleadingly provides content from other sites,” as well as to parked domains and pages that attempt to install malware. A Bad page addresses the intent of less than one percent of searchers.

The document goes into some detail on additional ratings like “Detrimental,” which applies (in part) to web documents that display adult-only content, and “No Judgment” for pages that can’t be accessed for a variety of reasons.

Freshness

There’s a fairly detailed section on freshness. It explains why judges should take freshness into account when reviewing web documents and suggests situations when fresh content is more valuable and others when it’s not as important. The document explains that there are “essentially” three categories of freshness-related queries — Fresh Not Important, Very Likely Fresh and Most Likely Fresh — and offers this chart with example search queries to distinguish them.

bing-freshness

Additional Considerations

There are also sections addressing queries where the search term is a URL, how to judge misspelled queries and how to judge local queries. For example, the home page of the Arizona Hispanic Chamber of Commerce is considered “Perfect” for the query hispanic chamber of comerce glendale az because Glendale is a suburb of Phoenix and there’s no Hispanic Chamber of Commerce office in Glendale.

As I said above, very little has ever been written about Microsoft’s Human Relevance System project for rating search results. From reading through the guidelines doc, I’d say it’s not all that different from Google’s handbook for its raters, which we first wrote about in 2008.

Related Topics: Channel: SEO | Features: Analysis | Microsoft: Bing | Microsoft: Bing SEO | SEO: General | Top News

Sponsored


About The Author: is Editor-In-Chief of Search Engine Land. His news career includes time spent in TV, radio, and print journalism. His web career continues to include a small number of SEO and social media consulting clients, as well as regular speaking engagements at marketing events around the U.S. He recently launched a site dedicated to Google Glass called Glass Almanac and also blogs at Small Business Search Marketing. Matt can be found on Twitter at @MattMcGee and/or on Google Plus. You can read Matt's disclosures on his personal blog.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • Gary Bisha

    They could instead use min freelance and ask the workers to rate a search page result for satisfaction level.

  • http://www.brickmarketing.com/ Nick Stamoulis

    ” A Perfect page should address the intent of at least 50 percent of searchers.”

    I find it interesting that a “perfect” page could still be the wrong information for 1/2 of the searchers. I would have thought that percentage would have been higher but at least Bing recognizes that it’s hard to nail user intent perfectly 100% of the time.

  • ChristianKunz

    I really would like to know the percentage of websites that are rated this way. If the approach is to check a representative share then good look bing!

  • http://www.facebook.com/john.beagle John Beagle

    And how do you go about looking at a billion search result pages and rating them individually for each set of keywords? 

    One thing that might help is if you had a reporting tool for bad serps. 

  • Touseef Hussain

    I think there should be no page 2, page 3, feature.. Bing Should be different. Instead of browsing through pages, there should be some jquery kind of thing which slides the page, it would be a  better experience for the user…We are still using the same method what we were using in 1999

  • totnuckers

    So basically they are just following Google’s lead

  • http://twitter.com/fiend4house internet boss

    They need to focus more on strategic partnerships and effective marketing before this. Their search share continues to tumble against Google. What is a good set of search results worth if nobody ever sees them?

  • http://twitter.com/WesleyLeFebvre Wesley LeFebvre

    Some great insight into how Bing’s search team thinks. Thanks for sharing it with us, Matt!

  • http://twitter.com/goodwinfamily goodwinfamily
  • Iblis Bane

    A 25% chance of the page matching my intent doesn’t sound very excellent to me… (Ok, 25-50%, but at best, that’s still a failure rate of 1 in 2…)

    Are people such sloppy searchers?

  • http://www.facebook.com/the.nathaniel.bailey Nathaniel Bailey

    I get what your saying, but I would have thought there are some good
    reasons behind that “at least” 50% being lower then you might think at
    first.

    Maybe it says “at least 50 percent” for a reason, such as searches that may have two or more meanings, meaning that not all people would find the result helpful or what they are searching for.

  • http://twitter.com/nondisclosure1 non disclosure

    I worked on both Google’s and Bing’s search quality ratings teams. All the stuff you get from these self-proclaimed professional raters you interviewed is beyond basic.

    The stuff you described above is the most basic core principles shared by both teams, and I dare say all similar systems. You don’t even need to be officially hired to get to this information and the chart you showed (copyright?). At both Bing and Google, once you are in the interview process, you’d be sent a handbook that includes all this, but before that, you have to sign a non-disclosure.

    The Google rater you interviewed earlier sounds so incompetent and misunderstood/ failed to explain correctly things so much that the whole article is just misleading.

    Don’t waste your time (and money) interviewing these people in the future. Anyone good enough to understand a non-disclosure agreement would not be dumb enough to talk to you about this. 

  • http://blog.clayburngriffin.com/ Clayburn Griffin

    Sites that unnecessarily delay content loading on purpose, usually to have totally awesome navigation animations, should be marked as bad.

  • DanHigson

    Great post! It amazes me how many people are using Bing. At least it’s not Aol!

  • seoword

    This makes sense. Thank you. 
    My dumb question of the day is what the raters are measuring. Are they rating individual sites or the accuracy of the algorithm for producing quality search results?

  • http://www.netmagellan.com/ Ash Nallawalla

    Here is another way Microsoft evaluates a SERP – Evaluating Search Systems Using Result Page Context – http://www.cs.albany.edu/~ashwin/BaileyIIiX2010.pdf or Learning to Rank for Freshness and Relevance – http://research.microsoft.com/pubs/150747/dai2011.pdf etc Microsoft researchers share their in various academic forums regularly.

  • http://twitter.com/nondisclosure1 non disclosure

    This is what SEL should introduce their readers to, not the “interviews” of self-proclaimed raters, who may not have actually worked for the companies.

    What the raters do is all these theoretical stuff put into practice. There really is no secret.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide