• skydive.ny

    Interesting concept – thinking back to my DOX statistics course; what premise is being violated then, which in effect violates the process of randomization…independence I believe?

    The trials are not independent, is another way to describe this?

    On 1 search, there’s a 50% chance they will see both ads.
    With 2 searches, the probability rises to 75%.
    After 3 searches, the probability becomes 87.5%.
    After 4 searches, the probability reaches 93.75%.

    Jeff

  • Stupidscript

    To put it as non-confrontationally as I can: How can any Google ad test be a valid A/B test when Google modifies ad delivery depending on their projected return?

    Examine of your log files and you’ll also note that many times your ad is shown as a result of an irrelevant query … and clicked on! There is so much noise in Google clicks that it is almost impossible to trust the A/B ad data Google provides.

    For example, if I get 5 clicks for my “widget coloring” business, I might see queries something like this in my logs:

    1) coloring book
    2) widget coloring
    3) hair coloring
    4) customized widgets
    5) “why bother coloring widgets”

    Of those, how many should actually be considered to be providing valid data for my little A/B test? Right. 3 out of 5 = 40% irrelevant clicks. Useful for the A/B ad test? Not very, unless you’re testing for irrelevant clicks. Included in the A/B statistics? Yep, without a whisper.

    All I’m saying is that any system that purports to generate statistical data while at the same time manipulating the activities that produce the data can not be trusted to provide more than the faintest hint of a trend. So in addition to your excellent thesis about why multiple ads perform better than single ads, don’t forget to toss in that *all* statistics gathered from Google should be considered to be preliminary, at best, and not accurate without additional analysis on your part.

  • http://www.findmefaster.com Matt Van Wagner

    Thank you, Jeff. This is a great question and I’ll admit I don’t yet have a great anwser.

    We don’t know what the ad rotation rules are. Only Google, Yahoo and Microsoft have this information. However, we do know that quality score influence on the rotation prevents it from being strictly random, and other economic factors that mean the trials (ad impressions) are not truly independent, either, so you are 100% correct that there has to be a better way to describe the probability of whether or not an ad will be shown over the course of multiple search sessions. For now, I am assuming randomness is a good starting point for guessing at the probabilities and that adding in some bayesian logic will help tune in our guesses.

    If you have any alternative ideas/theories/formulas on how to describe this probability, please write back. Would be most grateful for your thoughts.

    Matt

  • http://www.findmefaster.com Matt Van Wagner

    Yep, match-types definitely pollute the A/B test results, stupidscript and make evaluating A/B test results perilous. It’s dicey if you use only exact match. Using broad match can render A/B tests useless, as you politely point out.

    Thanks – hope other readers take your comment seriously and start looking at their logs.

    Matt

  • Andrew Goodman

    Interesting debate.

    Pollution aside, it’s possible to assess the principles somewhat in isolation, without match types overly entering into the equation. One way would be to run an ad group on an exact match or a phrase match without too much skew in the profiles of search queries from one to the other.

    In terms of ad delivery, Google’s choices used to include “rotate evenly” along with the CTR/QS-happy “optimize” setting. Today, they’ve essentially admitted to wresting some testing control from our hands, as that option in the interface now reads “… *more* evenly”. More evenly? Ouch!

    Look, I don’t see that as detracting from the general principle, which is that you can run A/B (I don’t generally advocate these as the most precise way, BTW; I believe more ads is better, MV testing if possible on mature accounts) tests and often find statistically significant winners.

    Long story short, even though he gives us stuff, I don’t know if I’m buying what my friend Matt is selling re: winning ads “degrading” or the superiority of rotating multiple ads that might appeal to different personas. The data seem sketchy to me. We’d ideally show the right ad to the right persona, of course, and arguably, Google is working on that type of thing with Conversion Optimizer.

    In current practice, I don’t believe entirely in this “ad sets” principle, but I do see the logic of the ad set in theory, as it would offer a more fertile, genetically diverse if you will (or multiple-persona-friendly) set of ads to potentially pair up with the “right” searcher. (That might point to a tactic if you were using an extremely smart Conversion Optimizer type of product: don’t manually reduce your ‘ad set’ to one or two winning ads, when the technology would do a better job of pairing up the right ad with the right searcher, if you maintained a sufficiently diverse, larger set of highly effective ads… the one or two ad formula might harm performance. So yes Lobster Man, I can see how the logic works.)

    IMHO, that’s currently mostly pie in the sky and needs to take a back seat to more rigorous and creative testing in general.

    We do know this much: ads do “win and lose” and we need to choose winners and cull losers. The vast majority of folks in the industry have virtually no strategy and no insight into this process, either on the creation or the analysis side. Fortunately for many, it’s not too far off the mark to go with the best-CPA ad creatives when there are major disparities in CPA.

    Through all of this, one thing we’ve learned from experience is (whether you owe it to the “degradation effect,” randomness, or other reasons that could be explained with a slightly more sophisticated version of the classroom statistics we try to hang on this as naive theorists) – you need a lot more volume to confirm statistically significant, real impacts than you think, especially if you’re not controlling for all kinds of volatile variables, or variables as simple as ad position or competitor messaging. Significant differences in performance should speak to you; small differences tend to say very little and are often reversed.

  • http://www.findmefaster.com Matt Van Wagner

    Hi Andrew,

    Thanks for stopping back in to your old neighborhood, and weighing in with your trademark well-considered opinion.

    (For those of you new to this space, Andrew Goodman authored this monthly column for past 3 years, and is now sharing his PPC experience biweekly over at ClickZ in his new column, Paid Search Strategies.)

    The test you propose is worth a try, though I wonder how much influence adgroup level QS would skew the results towards the ad group with solo winning ad since it would have a higher CTR. Worth testing, certainly.

    I am not sure how much Conversion Optimizer would resolve the problem of divergent search populations. More than persona matching, it would have to match to actual real people, and how much does Conversion Optimizer know about who is searching? Does it know man from woman, rich from poor, emotional from rationale searchers?

    Further to your point on Conversion Optimizer, as far as I am aware, the Ad Optimizer algorithms and the Conversion Optimizer algorithms work independently. If you have Ad Optimizer turned on, then it does it’s job to select the next ad up, before Conversion Optimizer decides where to place the ad on the page.

    (Anyone from Google care to weigh in on this?)

    I’ll admit that there’s more work to do on the Ad Sets Optimization Model and I am still laying the foundations for it using hypothetical data and scenarios until we can find a way to test it practically.

    My invitation still stands to any and all to help test and poke at the theory with real world tests and data that demonstrates how it works (or doesn’t)? Would love to hear from you.

    Thanks again for stopping by, Andrew!

    (p.s – Your first book is what got me started in this business in the first place, and I continue to appreciate your pleasantly provocative musings.)