Designed To Fail: Why Many Tests Give You Meaningless Results
You built out your new ad copy, tested out a bidding strategy, measured web and store sales to measure the online to offline effect; however, in the end you got the worst outcome possible – inconclusive results. A negative result would have been better; at least you would have known that your hypothesis was wrong […]
You built out your new ad copy, tested out a bidding strategy, measured web and store sales to measure the online to offline effect; however, in the end you got the worst outcome possible – inconclusive results.
A negative result would have been better; at least you would have known that your hypothesis was wrong or that your strategy was not effective. But an inconclusive result tells you nothing, which can be incredibly frustrating as a marketer.
There are many reasons why a well-designed test might fail. For example, seasonal effects might be ignored, the dataset might be too small or the marketplace might change during the test.
However, a very common error in test design is not accounting for volatility – fluctuations in performance due to unpredictable events in the marketplace.
In this post, I shall delve into the issue of volatility, how it might lead to a test with inconclusive results and finally, how you can mitigate its effect on your test.
A Thought Experiment
To understand the issue better, let us assume that you want to test the hypothesis that online SEM spending leads to offline store sales. To test this hypothesis, you ramp up your online budgets in increments every week.
Your plan is to run the test for 5 weeks, collect the data, do a regression analysis and answer the question, “What does one dollar spent online lead to in offline sales?”. Now let us, put some real numbers in this thought experiment.
Daily offline store Revenue= $550,000
Daily baseline online SEM spend=$10,000
Your plan is to spend $10,000, $15,000, $20,000, $30,000, $40,000 per day on SEM in weekly increments, i.e. you spend $10,000 per day on week 1, $15,000 per day on week 2 and so on.
Taking the thought experiment further, let us assume that one dollar spend online leads to $3 in offline revenue. If this were the case, then your offline store revenue would look as follows:
|Week||Daily Store Revenue||Daily Online Attributable
|Daily Total Store Revenue|
If we were to plot the numbers in a chart, we would get the following graph. The slope of the graph is 3 telling us that one dollar spent online leads to $3 in offline revenue.
Volatility & Its Effect On Your Experiments
Volatility means that your store revenue would never be exactly $550,000 every day. Instead, it will be a number close to the $550,000 average and will fluctuate daily.
It also means that the online contribution will never be exactly $3 for every dollar spent but a number that fluctuates close to $3. Let us assume that the daily volatility in the online store revenue is 15% of average. The experimental results will now look something like this:
It is unclear from the graph if there is any relationship at all. Further, even the statistical confidence measure (R squared) is 5.25% indicating that we are not confident that the regression is meaningful.
So why did this happen? The 15% volatility in offline store revenue masked any effect that the online SEM spend had.
For instance: If the SEM spend contributed to $60,000 in store revenue but the store revenue was $60,000 lower on the same day due to volatility, the two effects would cancel out each other and you would see no change in the total revenue.
Clearly this would be an expensive, time consuming experiment which would lead to inconclusive results. Moreover, this could happen for any experiment including an ad-copy test, a landing page test, a promotion etc.
What Can Advertisers Do To Prevent This?
- Before running the experiment, measure the volatility on a measured variable. In our example, we would measure the volatility on total store revenue.
- Check to see the minimum impact your test would need to have to be measurable. In our example, we would need to estimate the minimum impact the online spend can have on store revenue in order for us to measure the effect.
- Another parameter to experiment is the number of days you want to run the experiment. Experiment duration is always a trade-off and there are always conflicting issues to be considered. Running the experiment for a longer time period might give you a more robust answer but would you be willing to wait longer for the result? Further, would marketplace forces such as CPC inflation and seasonality lead to more or less volatility in the longer duration?
- Test your assumptions before running an experiment. You can build a simple experimental simulation in Excel or a statistical package like R and check to see if your test will give you meaningful results.
Following these steps will help you avoid the heartache of expensive, time consuming and inconclusive tests.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.