The Pitfalls Of A/B Ad Split Testing, Part 1

While channel surfing a while back, I came across the “National Dog Show,” a TV program where dogs of all shapes and sizes are paraded around by their handlers, and then poked and prodded by judges who select one of these purebreds to win the top prize.

It was great fun to see these beautiful canines compete to become top dog, but there’s an unfortunate downside to this sort of competition. Evidence suggests that selective breeding of dogs for looks alone can lead to a thinning of their genetic diversity and the emergence of genetically-inherited defects. The purest of the purebreds are often less healthy and don’t age as well as their more genetically diverse cousin, the less attractive, but very lovable, mutt.

As I turned off the TV and got back to writing ad copy for PPC A/B testing, a question popped into my mind. Is it possible that we are over breeding, or in our PPC vernacular, over-optimizing our text ads to the detriment of the overall health of our campaigns?

The answer, surprisingly enough, is yes! A/B testing, taken to its extreme, can actually cause PPC campaign performance to degrade.

Creating best of breed ads through testing

A/B ad testing seems simple enough.

You run two ads in an ad group, let them rotate evenly so ad impressions are split evenly between the two ads. After a while, you evaluate the results and declare a winner based on highest click-through rate (CTR), conversion rate (CVR) or the blended ratio of those two metrics, CTR times CVR. The losing ad is tossed out and the winning ad moves on to the next round of testing.

The next round in the A/B ad testing process looks a lot like the first, except that it involves testing the current champion ad against a new challenger ad, letting those two ads battle it out for the best CTR and CVR, until a winner emerges. This process is repeated, ad infinitum, if you’ll pardon the pun, with the hope that eventually you will end up with the best PPC ad ever written in the history of online advertising. In reality, what usually happens is that you get bored with the test or run out of copywriting ideas, and so you just end the test, set the champion ad as the default, and then move on to optimize other parts of your PPC campaigns.

No one will deny that A/B ad testing is valuable and that it enables a rational, scientific approach to campaign optimization that can yield practical improvements in your PPC campaigns, especially early on in the testing cycle.

A/B Testing is so simple and easy to understand that it’s hard not to like it and use it all the time. It has become somewhat of a sacred cow in that regard. But as with many other PPC campaign optimization tactics, A/B ad split testing can be misapplied. What’s worse, it can actually degrade your campaign performance over time. Why?

Simply put, the over optimization of ads and the resulting potential decline in ad group performance.

To understand how and why A/B testing can cause performance declines, let’s assume you own the Blue Widgets store down on Main Street and a customer walks in the front door. You know nothing about the customer except the fact that they just walked into your store.

Here are the unique selling features and benefits of your store:

  • Selection: Blue widgets available in any shade of blue
  • Savings: Save 20% on blue widgets this month
  • Quality: Eco-friendly blue widgets
  • Availability: Blue widgets in stock for fast delivery
  • Brand: We carry ACME blue widgets

Of these benefits, you know that saving money is on most people’s minds, and generally appeals to a wide cross-section of people.

So the key question: Will you have a better chance of selling to this person if you help them understand all of your store’s unique selling features and benefits, or tell them about the 20% off sale, repeating yourself five times.?

Subjectively, I think most of us would say that you’d have a better chance of connecting with a customer if you used all five benefits, since you have 5x greater chance of hitting one of your potential customer’s hot buttons.

In a PPC campaign an ad is your sales pitch. It is what brings customers in the door. The challenge for PPC ads, however, is that you can’t fit all of your top benefits into a single ad. Instead, you break the messages up individually, or group a few together, and then start the process of A/B testing (or multivariate testing if you are advanced) to determine the best message.

After a few rounds of testing, you find out that the “Save 20%” is your best ad and so you pause all the other lower-performing ads, and congratulate yourself on optimizing your campaign.

However, this is where A/B ad testing can lead to erroneous conclusions. The tests do not optimize your ad groups; they simply identify your best ad. While A/B testing can do a great job of identifying your best ad, narrowing your ad groups to a single lone winner can be a huge mistake. Your most effective optimized ad group sometimes requires two, three or more ads to perform optimally, not just the champ from your A/B testing.

Here’s why. When people are looking for products and services they make multiple searches. Sometimes they refine their queries, but often times they type the same or nearly identical search query over and over again.

What this means is that someone searching on “blue widgets” five times, will see “save 20%” ad five times. If they search ten times, they’ll see it ten times. The only thing they will know about your blue widgets is that it will save them money.

Just like in our example above of a person walking into your store and getting touched by five great selling benefits, PPC ad groups that offer more than one great ad should outperform ad groups that are limited to a single “best of breed” ad.

When it comes to optimizing ad groups, give me five lovable mutts over one blue ribbon winner anytime.

Next month, I’ll take a look at the math behind A/B testing and demonstrate how an ad group can outperform the best single ad in that ad group that emerges from A/B tests. I’ll also take a look at the challenges of A/B testing at the ad group level, and how to maintain healthy message diversity in your ad campaigns.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEM | Paid Search Column


About The Author: is President and founder of Find Me Faster a search engine marketing firm based in Nashua, NH. He is a member of SEMNE (Search Engine Marketing New England), and SEMPO, the Search Engine Marketing Professionals Organization as a member and contributing courseware developer for the SEMPO Institute. Matt writes occasionally on internet, search engines and technology topics for IMedia, The NH Business Review and other publications.

Connect with the author via: Email | Twitter | LinkedIn


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • allenkristina

    Very interesting take on A/B testing, Matt. I’m looking forward to part two in the series to learn more.

  • Matt Van Wagner

    Thank you, Kristina

    Is there anything in particular you’d like to see addressed in the next column?

    Have you observed any strange phenomena such as ad group performance declining after pausing lower CTR ads?

    Maybe we’ll go to a Part 3 after I present a more objective analysis in Part 2.


  • jesse223

    I have definitely noticed the phenomena of pausing lower performing ads and seeing a subsequent decrease for the well-performing ad.

    I have wondered if this may be the result of the inaccurate data population between ads. And wondered if there really needs to be a much longer window to see a true top-performing ad. That being said, I don’t know what that time would be. Longer than 2-3 months? Depending on the amount of traffic, I guess.

    But your article makes sense in that users will see different messages on the variations. I think that is a good approach. I come accross several ads that perform within target CPA, and it is hard to make a decision. Then again, I am always worried about the fact that I could pause one ad and the adgroup will suddenly take a turn for the worse. Strange.

  • Matt Van Wagner

    Hi Jesse

    Thank you for commenting. I suspect that many PPC managers have observed exactly what you are describing, couldn’t find a rational explanation for it, and just walked away from it scratching their heads.

    The tough part about your performance result is that there is no easy way to isolate the root cause. There are many suspects to implicate, including rotation of ads, length of the consideration cycle, keyword match-type /ad group interaction, seasonality and so on,

    I’ll dig into this and other issues in more depth in next months’ column. Stayed tuned…


Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide