Be Careful When Evaluating Paid Search Tests

Interpreting test results for paid search campaigns can be surprisingly difficult. One reason for this is order latency. The fact today’s clicks don’t all generate orders today, but instead sales trickle in over time means that analyzing new launches and tests can be tricky. Two ways to address this complication are described below.

Problem: Successful tests can look bad initially because of order latency

For example, let’s say the order latency for a particular advertiser with a 14 day cookie window looks like this:

Rimm-Kauffman 1

56% of the orders come in within 24 hours of the time of the click, 10% come during the next 24 hour period, etc.

So, on that first day you see 100% of the clicks on your ads, but not nearly all of the orders those clicks will drive—actually, quite a bit less than 56%, as clicks late in the day have less time to “mature.” For sake of simplicity, let’s ignore that bit. Doing so allows us to map out what a tremendously successful test might look like.

Let’s say an advertiser launches a new product category and new keyword ads are developed. Let’s say the advertiser’s efficiency target is a 25% cost to sales ratio, and let’s say their brilliant PPC firm nails the bids right out of the gate. The clicks generated on day 1 cost $1,000 and will eventually drive $4,000 in sales, but here’s what the results look like as they unfold spending $1,000 every day at perfect efficiency:

Rimm-Kauffman 2

Yielding a day-to-day apparent cost to sales (A/S) ratio that looks like this:

Rimm-Kauffman 3

The first few days of the test it appears that the efficiency is way above the advertiser’s comfort threshold. It takes the full duration of the cookie window, for the observed efficiency to match the actual efficiency of the advertising. Advertisers who don’t recognize this effect may cancel tests, or pull back on the rudder too quickly.

Indeed, what this suggests is that every launch and every extra bid push will appear to be less helpful to the top line and more harmful to the bottom line than reality. On the flip side, every pull back on bids will appear to be more helpful to the bottom line and less harmful to the top line than it really is because of the lagging orders from the higher click volumes that preceded the test.

The greater the order latency, the bigger the impact. We typically find that more considered purchases, and higher average order value advertisers have greater latency than average which impacts the proper length for the cookie window.

However, no one wants to wait 14 or 30 or 90 days to read the results of a test. In the example above, the PPC agency hit the bids right on the head from day 1. When that doesn’t happen, it’s good to find out sooner rather than later that you’re undershooting or overshooting.

Two methods for evaluating tests

Shorten the sales window. Instead of evaluating the test based on the full cookie window, study the data based on a same session or one-hour sales interval. In the example above, if 35% of the eventual orders normally come within the first hour of the click, extrapolate the results from the first few days based on that number. If the ratio of cost to (observed 1-hr sales/0.35) is on target, the test is probably on target.

If an advertiser is attempting to learn the top-line vs bottom-line trade off of bidding to a 30% A/S target rather than a 25% A/S target, compare the % increase in 1-hour sales to the % increase in cost. That should be a pretty good proxy for the A/S ratio on the incremental sales.

Tie orders to the time of the click. Most reports show the PPC costs for the day, and the PPC sales taken that day. It’s entirely likely that half of the sales taken that day came from earlier clicks. By running reports tying the sales to the time of the click, rather than the time of the order, you get a much clearer picture of what your actions on that day did for you over the long haul. This is particularly useful for studying past tests and anticipatory bidding at the holidays to see whether you anticipated the improvement in traffic quality appropriately.

The problem with the first method is that it assumes the latency for the new product category, or incremental traffic, will be the same as it’s been in the past. Not a bad guess, but potentially misleading. The problem with the second method is that you can’t use it fully until the cookie windows have elapsed.

By using method 1 during the early phases of the test and method 2 after the test is “complete,” a good analyst can avoid missing opportunities and overspending during the test, and get a dead-eye accurate read on the results after the fact.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEM | Paid Search Column


About The Author: is Co-Founder and Chief Marketing Scientist of RKG, a technology and service leader in paid search, SEO, performance display, social media, and the science of online marketing. He also writes for the RKG Blog. Follow him on Twitter at @georgemichie1.

Connect with the author via: Email | Twitter | Google+


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • Erez PPCisme

    Hi George,

    You talked about running reports tying the sales to the time of the click. Most of the times your client won’t have this option, at least this is where most my clients stand, so it leaves us with the problem of not evaluating tests right. I think the best solution for it is looking at the wider picture and instead of comparing a certain day vs. the day before it, compare an entire month of running the test vs. the month before that. Of course here you will need to rule out seasonality effects on your campaign.
    It’s important to pass this to your client before starting with a test that might really effect your performance.


Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide