Bing Slams “Freakonomics” Bing It On Challenge Critique

BingItOn

Yesterday, we reported on a study appearing on the “Freakonomics” blog that disputed the “Bing It On” claim that people prefer Bing to Google in a blind comparison of search results. Study author Ian Ayers sought to replicate the Bing It On challenge methodology and argued that Bing’s claims were false and its messaging deceptive.

Bing pushed back hard yesterday in several ways. There was a lengthy point by point refutation of the Ayers report in comments posted to the story I wrote at Search Engine Land from Matt Wallaert, behavioral scientist at Bing. Wallaert also responded to a similar story about the Ayers study at at Search Engine Roundtable.

Microsoft later issued a formal statement from Wallaert:

The professor’s analysis is flawed and based on an incomplete understanding of both the claims and the Challenge. The Bing It On claim is 100% accurate and we’re glad to see we’ve nudged Google into improving their results.  Bing it On is intended to be a lightweight way to challenge peoples’ assumptions about which search engine actually provides the best results. Given our share gains, it’s clear that people are recognizing our quality and unique approach to what has been a relatively static space dominated by a single service.

Later in the day there was a blog post from Microsoft about the Ayers study. It echoed the points made by Wallaert in his comments to the blog posts. Below is most of the Wallaert post:

A couple of notes are important before I talk about Ayres’ claims. There are two separate claims that have been used with the Bing It On challenge. The first is “People chose Bing web search results over Google nearly 2:1 in blind comparison tests”. We blogged about the method here and it was used back in 2012. In 2013, we updated the claim to “People prefer Bing over Google for the web’s top searches”, which I blogged about here. Ayres’ frequently goes back and forth between the two claims in his post, so I wanted to make sure both were represented. Now, on to Ayers’ issues and my explanations.

First, he’s annoyed by the sample size, contending that 1,000 people is too few to obtain a representative sample on which to base a claim. Interestingly, Ayres then links to a paper he put together with his grad students, in which they also use a sample size of 1,000 people. They then subdivide the sample into thirds for different treatments condition and yet still manage to meet conventional statistical tests using their sample.

If you’re confused, you’re not alone. A sample of 1,000 people doing the same task has more statistical power than a sample of 300 people doing the same task. Which is why statistics are so important; they help us understand whether the data we see is an aberration or a representation. A 1,000 person, truly representative sample is actually fairly large. As a comparison, the Gallup poll on presidential approval is just 1,500 people.

Next, Ayres is bothered that we don’t release the data from the Bing It On site on how many times people choose Bing over Google. The answer here is pretty simple: we don’t release it because we don’t track it. Microsoft takes a pretty strong stance on privacy and unlike in an experiment, where people give informed consent to having their results tracked and used, people who come to BingItOn.com are not agreeing to participate in research; they’re coming for a fun challenge. It isn’t conducted in a controlled environment, people are free to try and game it one way or another, and it has Bing branding all over it.

So we simply don’t track their results, because the tracking itself would be incredibly unethical. And we aren’t basing the claim on the results of a wildly uncontrolled website, because that would also be incredibly unethical (and entirely unscientific).

Ayres’ final issue is the fact that the Bing It On site suggests queries you can use to take the challenge. He contends that these queries inappropriately bias visitors towards queries that are likely to result in Bing favorability.

First, I think it is important to note: I have no idea if he is right. Because as noted in the previous answer, we don’t track the results from the Bing It On challenge. So I have no idea if people are more likely to select Bing when they use the suggested queries or not.

Here is what I can tell you. We have the suggested queries because a blank search box, when you’re not actually trying to use it to find something, can be quite hard to fill. If you’ve ever watched anyone do the Bing It On challenge at a Seahawks game, there is a noted pause as people try to figure out what to search for. So we give them suggestions, which we source from topics that are trending now on Bing, on the assumption that trending topics are things that people are likely to have heard of and be able to evaluate results about.

Which means that if Ayres is right and those topics are in fact biasing the results, it may be because we provide better results for current news topics than Google does. This is supported somewhat by the second claim; “the web’s top queries” are pulled from Google’s 2012 Zeitgeist report, which reflects a lot of timely news that occurred throughout that year.

To make it clear, in the actual controlled studies used to determine what claims we made, we used different approaches to suggesting queries. For the first claim (2:1), participants self-generated their own queries with no suggestions from us. In the second claim (web’s top queries), we suggested five queries of which they could select one. These five queries were randomly drawn from a list of roughly 500 from the Google 2012 Zeitgeist, and they could easily get five more if they didn’t like any queries from the five they were being shown.

Google’s Matt Cutts reacted to Ayers study on Google+:

Freakonomics looked into Microsoft’s “Bing It On” challenge. From the blog post: “tests indicate that Microsoft selected suggested search words that it knew were more likely to produce Bing-preferring results. …. The upshot: Several of Microsoft’s claims are a little fishy.  Or to put the conclusion more formally, we think that Google has a colorable deceptive advertising claim.”

I have to admit that I never bothered to debunk the Bing It On challenge, because the flaws (small sample size; bias in query selection; stripping out features of Google like geolocation, personalization, and Knowledge Graph; wording of the site; selective rematches) were pretty obvious.

Regardless of whether Bing or its critics are right about whose study methodology is more flawed, the thing that was most interesting to me about the Ayers findings was the fact that Bing won 41 percent of the time. That suggests, in the context of an arguably antagonistic study, Bing did very well and is almost at parity with Google.

It would also seem to support what Microsoft has been claiming — that the Google brand and not necessarily search quality is now what sustains Google’s dominance in search.

Related Topics: Channel: Industry | Microsoft | Microsoft: Bing | Microsoft: Business Issues | Top News

Sponsored


About The Author: is a Contributing Editor at Search Engine Land. He writes a personal blog Screenwerk, about SoLoMo issues and connecting the dots between online and offline. He also posts at Internet2Go, which is focused on the mobile Internet. Follow him @gsterling.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • josephjrobison

    Every time I test out Bing searches for everyday queries that I’m actually looking for, and not the currently trending ones, I’m reminded why I still use Google. The simple fact is that if industry leaders started realizing that Bing results were actually better, they would push for it in blog posts and announce it from the mountain tops, and that would trickle down to the general population. But that really isn’t happening much. The majority of people keep using Google because for the most part, it’s just better.

  • Jeff

    Wallaert fails to address the biggest flaw in Bing It On. Specifically, that “10 blue links” is not representative of how either of the two search engines operate. I remember when I first heard about the challenge and took it home for the missus, she tried a few of the keywords that “Bing won” on the actual search engine pages and had an overwhelming preference for Google. If I recall correctly, it was because of local results.

  • willyj

    People still using the retro Google? You must be bored all your life courtesy of white space subliminal message. lol

  • Colin Guidi

    “The Bing It On claim is 100% accurate and we’re glad to see we’ve nudged Google into improving their results.”

    so when did this happen? No mystery that Bing has always been playing catch up.

  • Ryan Aslett

    “the thing that was most interesting to me about the Ayers findings was the fact that Bing won 41 percent of the time.” – well, yes, except you arent really comparing bing to google in that test. You’re comparing bing to a version of google thats missing “geolocation, personalization, and Knowledge Graph” – all things that google uses to improve relevancy and therefore its results. Not even remotely an apples to apples comparison.

  • JadedTLC

    I have to disagree with you on this one. Brand influence is a huge barrier, even for industry experts. If you’ve only consumed Coke your entire life and love it, switching to Pepsi (even if it was the better product) would be very, very difficult, unless somehow it came out that Coke was putting drugs in their drink.

  • Alice Lee

    How did bing not store the results but still figure out they won 2:1?

  • http://www.tweedsolutions.com/ Tweed Solutions

    I still use Google for 99.9% of my searches

  • http://www.v2interactive.net/ Josh

    This article and source article are both jokes.

  • http://www.mattwallaert.com/ matt wallaert

    I don’t address it because I’m responding to Ayres, who doesn’t address it. I think that’s a valid critique, although to do good science, sometimes you have to narrow down to a single apples-to-apples comparison (which would be hard, given that we have things like the social sidebar that become more relevant when you log in, etc.)

    (Note: I work at Microsoft)

  • http://www.mattwallaert.com/ matt wallaert

    Please read my comments; the claims were made using 3rd party research in a controlled lab setting.

  • http://www.mattwallaert.com/ matt wallaert

    And a version of Bing that is lacking all of those features (which we also have, if you consider Snapshot and Knowledge Graph to be similar) and no social sidebar from us, etc. See my earlier comment about the difficult of doing comparisons on non-aligned features, hence the emphasis on the algo block.

    (Note: I work at Microsoft)

  • Ryan Aslett

    Therefore the analysis of the survey results presented in the blog post and subsequent marketing is making dubious claims based off of evidence from a study with a dubious structure.

    Doing comparisons on only the *aligned* features, and leaving out the non-aligned features, and then claiming a 2:1 preference in “web search results” is disingenuous.

    I believe that most would consider a web search result to be the entire page delivered by both bing and google, including ads, maps, and other ‘non aligned features’, and not *just* the algorithmic block. With css you could present the whole page in a non-branded neutral manner.

    The “New York Hotels” search used as the example search in your post is a great example. The map would probably be a very relevant result on both pages for many people, and where their click would go next.

    For a search that includes the phrase “near me” in it, Bing actually includes the map in the algorithm block, wheras it’s on the side on google. (Thats the behavior on the ‘bing it on’ site anyhow). What did people see as ‘algorithm block’ in this study?

    Perhaps to dispel some doubt microsoft would be willing to share the actual data and research from the independent agency. It’s probably unfair of Ian Ayres to sharpshoot the validity of your study without actually seeing it.

  • Sudeep Chakravarty

    The holidays are coming and eCommerce sites are heavily dependent on organic search results for their sales. By analysing one of our major eCommerce site I can say that Google has already started playing with the organic results, simply Google wants you to spend in ads if you are not, you are doomed. So there is no better time to shift to Bing and support their good work.

  • Jeff

    I’m failing to see the “good science” behind Bing It On or the study that launched it. It’s nontrivial that features from both search engines were stripped from the results page because these features (presumably) mean to enhance the quality and usefulness of the results. This would be like doing the Pepsi Challenge but both colas have to withhold parts of their formula. Even if a preference is being determined, it’s a meaningless one.

    But maybe more to the point, when was the verbiage switch made between people “choosing” Bing and people “preferring” Bing? That’s quite a leap if you ask me. If I understand the methodology of the study correctly, a participant choosing Bing 6/10 times is being labeled as having a preference for Bing, whereas a more accurate label might be “no preference.” I’m left wondering on a single search basis, how many times was Google chosen over Bing? If all of Bing’s 57.4% wins were 6/10 and all of Google’s 30.2% wins were 10/10, then people actually chose Google 59.4% of the time (if I’ve done the math right). I doubt the true results were so dramatic, but hopefully my point isn’t missed.

  • neotrope

    HI, sorry … I can’t help post a link to my “humorous” meme I did when the Bing Challenge ads started (moderator: I won’t be offended if you decide to quash this, just thought it funny to this discussion) :-) I think I also included this in retweet of article, so please don’t think I’m spamming here. For those who need a chuckle: http://ga-ga.com/christophers-memes-bing-it-on-the-bing-challenge/

  • DaymonH

    People use Google search, because their friends do, and don’t know any better.
    Microsoft is superior to Google on almost every level, and Google would
    not be where it is if it weren’t for the webmasters of the world who
    embraced Google for their “don’t do evil” approach. That didn’t last
    long, did it? They’re as evil as they come, and the same webmaster
    community that embraced Google is now turning to Bing’s search engine. It’s only a matter of time before Bing is on top, and Google may even have to contend for the 2nd spot with Marissa Mayer now at the helm for Yahoo! search. You heard it here 1st folks. Google’s best days are behind them.

  • http://www.telusplanet.net/public/stonedan Doug Pederson

    For personal search neither can touch Swamp search.
    Randomly pick a video then a random start point, plays for a preset number of seconds then the next.
    All my favorite bookmarks, passwords are placed in the clipboard for pasting.
    My emails since 1996 are searchable at 20,000,000 characters per second.
    Random groupings of text video audio pictures are playable as a question answer or marketing tool in storefront window displays.
    If you have a cause that you are fighting for. You need good access to your facts.
    See “nobody shares knowledge better than this” That’s the challenge to the rest of the 2nd place pack.
    Doug Pederson AKA SpectateSwamp

  • haertelnr04

    The true reason why you use a search engine is the quality of the links provided, not the look/feel/layout of the links on the serp. Bing It On is pointless because you cannot visit the websites of the links provided, which is how one should judge a search engine. Beauty isn’t serp deep, Bing.

  • Dave

    You say that 41% success for Bing means parity with Google, but I respectfully disagree.

    First, the actual, meaningful result is “55-57% preferred Google while only 35-39% preferred Bing”, because it’s from more realistic user queries.

    Second, the difference is still huge in terms of productivity. If we give Bing the benefit of the doubt and go with the 53% vs 41% numbers, then someone using Bing would experience a 29% improvement in productivity by switching to Google. That’s huge. If you could do that for every task across an 8 hour work day, you could leave work 1.8 hours early, every day. (In Bing’s worst case numbers, the productivity gain from going Google is 62%; you could leave work 3 hours early.)

    (I do realize that the leap to productivity is assuming that people are making a correct judgement of the search results, but I think it’s reasonable to make.)

    Third, Bing is in a come-from-behind position. And, unfair as it may be, you have to be better than the market leader in some way to win significant market share. Because of that, I read these numbers as evidence that Bing could currently be losing marketshare from people making a conscious decision about the search engine they use, as Bing is still behind on quality. The only thing they really have going for them is people who aren’t aware of what search engine they use…in that regard, advertising increases awareness, so Microsoft’s advertising could actually be a bad thing!

  • Caio Japiassu

    I don’t know, but it seems that nowadays people are getting more and more different, multitask and embracing more content/tastes/positions, there are less retricted and selected kind of tastes, music, political position, etc. So if people are getting more “ecletic”, maybe even 1,500 sample of results wouldn’t be enought to say something.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide