• http://www.clickequations.com Craig Danuloff

    Matt – Great Article (Both of them):

    Another key factor is the search query variation. If the ad group has multiple keywords, and especially if those keywords are not all exact match, the ad group will attract a wide range of different search queries. Finding the ‘winning’ ad for the group is an average – it could be that for one search query one ad kills and the other fails, but for other search queries the opposite is true. Only by looking at query-to-ad specific metrics can you see this pattern. By killing an ‘underperforming’ ad, you may be killing the only ad that performs for certain queries.

    This issue suggests that your idea of keeping several ads running is a good one, but with a slightly different rational – to find which queries react to which copy and then break the keywords apart into different ad groups so that each query gets the ad that is best for it.

    It’s not a keyword world, it’s a query world.

    BTW: Your articles are exactly the kind of deeper look at the complex reality that the market really needs, so we aren’t all fooled by false simplicity.

  • http://www.findmefaster.com Matt Van Wagner

    Thank, you, Craig.

    I agree with you 100%, and meant to mention this influence on ad group performance. I took note of this point at your presentation last week at SES in NY, where you mentioned seeing as many as 220+ variations of queries that were triggered by phrase matches.

    I suspect that even when exact match is used, ad set optimization will work because of mulitple searches by the same people, and also because of audience skews. For example, if you know your bigger audience is women, but you also sell to men, focussing on ads that only appeal to women will be to the detriment of almost half your audience. Assuming ad sets theory works, then you could write ads that appeal to and will likely be seen by your other audiences.

    BTW – you are officially in the hunt to win the lobster dinner!

  • http://www.autoquake.com Clement

    Can’t agree more with Craig – With “Expanded Broad Match”, the keyword definitely isn’t equivalent to the “search query”. In the same vein, it is also worth noting that the same ad can be triggered by the same *query* but in a very different context, and stilll count as the “same” impression. Google’s search network in general, and “Adsense for search” in particular, can create a lot of noise in the data, especially since Google’s QS only takes Google’s properties into account.
    Ad A/B testing relies on the assumption that the user intention stays relatively “stable” throughout the experiment, but those different websites can come and go and they can have different user intention on the exact same query (e.g. someone typing “ipad” on engadget vs. someone typing “ipad” on Google).

  • http://rebel-seo.com Rebel SEO

    I suggest that you use advance filtering in Google Analytics to display the exact search queries and then cross reference those with the ad text to find which combinations of ad/keywords are the best and then separate out the top performers into their own ad groups.
    So in other words, find the keywords that the “winning ad” is winning the biggest with, and create a new ad group with only those keywords and only that ad. Remove the keywords from the original ad group but leave the original ad. Then see what happens!

    Another thing you can do it duplicate the winning ad 2x so you have 4 ads total with your “winning ad” showing 75% of the time and the “loser” showing only 25% of the time. When I do this, I often find that performance varies greatly between the three copies of the winning ad.

    One last thing… I have noticed that when I first launch and ad group with two very different ads, Google will sometimes show one of the ads exclusively at first. Eventually they will give the other ad some love and it evens out 50/50, but perhaps that initial favoritism has to do with Google matching the most relevant ad to the specific query. So does each keyword really have a hidden quality score for each ad in its ad group? hmmm

  • Stupidscript

    A quick question about random v. even ad rotation:

    How can results be reliable if the test itself is truly random? If they are *truly* random, there is a 50/50 chance that only one ad will *ever* be displayed. But aside from that, randomness would seem to be detrimental to ad A/B testing given the broad array of other factors in determining why CTR moves the way it does.

    Testing until 42 total impressions is reached:

    Test 1 Random Ad Displays (ad #A and #B):

    Test 1 Average Results:
    Ad A averaged 50% CTR out of 34 impressions
    Ad B averaged 50% CTR out of 8 impressions

    Test 2 Even Ad Displays:

    Test 2 Average Results:
    Ad A averaged 50% CTR out of 21 impressions
    Ad B averaged 50% CTR out of 21 impressions

    It is true that one would wait until the impressions threshold (say, 21) is reached before concluding a test, however as news events swirl and the focus of public activity shifts, how can one have confidence in the results of any randomized A/B test? By the time the threshold is reached, ad B may have become irrelevant due to the changing face of the public interest.

    Why isn’t the consistency of even ad rotation of benefit to an A/B split test? Am I missing a basic statistical principle? Thanks for the insight.

  • http://www.findmefaster.com Matt Van Wagner

    Great points, Clement, thnak you.

    Even if you had 100% control over ad placements on the search partner sites (or content network sites), you still have diversity in the demography and shifting motivations your audience, so A/B testing falls short there as an optimization tool.

    You’re now in the hunt for a lobster dinner, too!

  • http://www.findmefaster.com Matt Van Wagner

    Thank you for weighing in, Rebel SEO.

    Focusing on queries, not just keywords is a great idea, and amplifies Craig’s point..

    Even even if you tighten down your ad groups so you know the exact queries that drive traffic, you still are left with a heterogeneous audience that a single ad can not fully address, so this is where ad set optimization,may boost your ad group performance.

    Interesting data from your launch. I am curious to know if the impressions eventually evened out. Also, did you notice that the fast starter in that ad group become the eventual winner? Would love it if you could share the daily CTR and impression history of those two ads, either by posting another comment or emailing me your data. I don’t need to see your actual ads.

    My point about even rotation, which Stupidscript also commented on below is that Google does not guarantee a predefined rotation scheme for ad rotation, they only say they will attempt to give each ad similar amount of impressions.

    Count yourself in on the lobster dinner bounty for your contributions to this discussion.

  • http://www.findmefaster.com Matt Van Wagner

    Good questions, Stupidscript.

    This may be a semantic problem, where I characteristed even rotation incorrectly. The rotation is not literally even, as in your ABABAB example Google defines rotate this way:

    ” Rotated ad serving delivers ads more evenly into the auction, even when one ad has a lower CTR than another. The impression statistics and ad served percentages of the ads in the ad group will be more similar to each other than if you had selected the optimization option. However, these statistics still may differ from each other, since ad position may vary based on Quality Score and CPC.”

    Source: Adwords Help.
    AdWords › Help articles › Your Ad Performance › Improving Your Performance › Ad rotation settings

    Notice that they are careful to explain that the ad rotation will serve ‘similar percentages’ and that other factors such as position and CTR take away the random rotation.

    My point any assumption of random sampling in the test is not supportable. A truly random sample could behave as you suggest, but is not likely over a reasonable sample size. Dust off your old stats books, and take a look at the coin-flip example. To your point, a sufficiently large sampling is needed for a valid test.

    Thank you for taking the time to contribute to the discussion. You are now eligible to win the lobster dinner from lobster.com, too. Check back next month to see…

  • MMantlo

    Are we thinking about ad copy too simplistically here? How can you measure the performance of a ad without taking into account conversion rate, and cost per acquisition or ROI? Isn’t that the bottom line of achievement anyway?

    CTR is great, but if your ad with the highest CTR also has the poorest CONV or Costs too much to be beneficial to your account, I think those are the instances in which you eliminate an ad in favor of another ad or group of ads. This helps qualify your traffic, and gives you the ability to do what Google says \win customers not just visitors.\ Ads that pre-qualify traffic are essential in the marketplace because it helps the user connect with what they are looking for before they cost the marketer money.

    I like a blend of ads in my ad groups. I try to select one or two with a high conversion rate and pair it with one or two with a efficient CPA or ROI; try to balance the best of both worlds. The idea is not to go for strict A/B testing but cluster testing in which you weed out ads that don’t meet you predetermined goals for achievement.

  • http://www.findmefaster.com Matt Van Wagner

    Hi MMantlo,

    Your point is absolutely correct CVR is the right place to measure. We limited the discussion to CTR, just for purposes of simplicity in this article and because it precedes conversion actions in the event cone.

    Your remarks also seem to indicate that your preference for optimizing to a set of ads.

    A couple of questions for you:

    – have you tested your ad cluster (which I call an ad set) against a solo ad ?

    – In your experience do the ads which remain online represent a variety of copy styles, offers, or calls to action?

    My model seems to suggest that wildly divergent messages could outperform single messages that are simply minor variations in text.

    Please let us know. You can reply here, or if you want, we can take this offline and discuss via email or phone.


    (… and, as a contributor to the discussion, you also have a shot at a lobster dinner delivered to your door, thanks to Lobster.com)

  • MMantlo

    Hi Matt,

    Sorry it took so long to get back to you on this, but after I last wrote I plunged myself into case studies of my current account for several days.
    With a base requirement for my examples of A/B testing being substantial traffic, and ad copy testing resulting in a single ad the results were as follows: 13 ad groups studied over 13 months yield 42 instances of an ad group running only one ad copy (just about 25% of the the time). Of that 64% of the time conversions fell, while at the same time over those 42 instances 60% of them saw an increase in conversion rate. This all led me to wonder if we were sacrificing conversions to the conversion rate. All of this led to a rather inconclusive set of case studies.
    For that reason I decided to conduct a larger study across the account over the data from the past 4 months. The findings: having one ad often produced the best CTRs and CPCs, but having 3 ads produced the best conversion rates and CPAs while have CPCs and CTRs that were in the top 3 (out of 8).
    However, this is the example of one account. It would be interesting to see an agency do a study like this to see if the results would remain consistent, or to see if there is variation among verticals, or clients using different metrics to measure success.
    More importantly we have to acknowledge that search does not exist in a vacuum. Bid changes, position changes and landscape fluctuations all have a dramatic impact not only on our accounts, but on the outcome of our testing.
    I’ve heard it said that people who work in SEM walk around with blinders on with regards to what goes on in the advertising world. But I find that most people tend to have a myopic view of everything we do in search. I’ve seen people destroy order volume in search of lower CPCs. In the quest to prevent keyword mapping I’ve seen people give up on broad match regardless of whether or not it converts for them. And, in search of the perfect ad people lower overall conversion rates and increase CPAs. Maybe we need to stop throwing the baby out with the bathwater, and start viewing our accounts with a more holistic eye.

  • http://www.findmefaster.com Matt Van Wagner

    Wow – thanks for digging in deep!

    Your data seems to support the idea that ad groups with multiple ads can outperform ad group with a single champion ad.

    I am hoping more people will examine this same issue in their accounts and look fo similar results and get curious about the concept of ad set optimization.

    Looking at data in hindsight is one thing, but trying to create a test that proves this conclusively is difficult because it requires you to simultaneously test an ad group that has only your best ad against an ad group that has multiple ads including the your best ad.

    I am still looking for ways to set up a true set of experiments that would demonstrate the result one way or the other. I’ve got some ideas on how to isolate some variables, but latency to sale, seasonality, ad ranking variations, and other factors make this very difficult.

    Great analysis – thanks!