The concept of “statistical significance” is probably one of the most misunderstood phrases in search marketing. People sometimes ask me to assess whether the difference between two clickthrough rates is “statistically significant” or not with the same look on their face as if they are asking if a particular rash looks infected.
“The clickthrough rate (CTR) was 2% on Friday, but 3% on Saturday. That’s a 50% increase. 50% is a lot, right?,” they ask. Well, it certainly is for income tax rates, but not necessarily for differences in clickthroughs. What if Friday saw 2 clicks from 100 impressions and Saturday saw 3 clicks from 100 impressions? Doesn’t sound so impressive anymore, does it?
Part of the problem is that it’s simply impossible to tell from that few impressions whether both have an inherent CTR of 2.5% (and you just happened to see 2 clicks for one and 3 clicks for the other) or whether they legitimately have different underlying CTRs.
Imagine a more extreme case: one ad has a CTR of 2% and one has a CTR of 100%. We see four impressions and all get clicks. How likely is this to be the data for the 2%-CTR ad?
Well, if it is the 2%-CTR ad’s data, then there’s a 2% chance that the first impression would generate a click. That’s about the same as the chance of randomly drawing the ace of spades from a well-shuffled deck of cards. And there’s a 2% chance that the next impression will generate a click, which is about the same as reshuffling that deck and then randomly drawing the ace of spades again (without any sleight-of-hand trickery).
So, the chance of seeing 4 clicks from 4 impressions for a 2%-CTR word must be very, very small, but (please take a minute to convince yourself of this, if you need to) it’s not absolutely zero. Even an ad with only a 2% CTR still might possibly generate 4 clicks from 4 impressions. It’s improbable, but not impossible.
That is why statisticians rarely seem to give a straight answer to whether two ad’s CTRs are different or not. “Statistical significance” is not really a Yes or No situation, it’s just the probability of seeing a certain sequence of events (like four ace-of-spades in a row) not happen purely by chance. Every new impression increases the certainty in our answer, but there is no specific amount of information that seals the deal.
By convention, statisticians often set an arbitrary cut-off of “5% chance of being explained purely by randomness” for classifying whether or not a difference is “statistically significant” or not. That’s why when a magician declares that he’ll pull a certain card from a deck, and then actually does so, the average geek in your life will joyously exclaim, “That’s statistically improbable!” We know that there’s less than the 5% cut-off chance that that card appeared purely by luck.
Imagine now that we have two ads, Ad A, for which we have observed a 2% CTR, and Ad B, whose observed CTR is shown on the x-axis of the graph below. The graph shows the number of impressions (per ad) we must see to be 95% certain that the two ads have different CTRs.
If Ad A has seen 2 clicks from 100 impressions (2% CTR) and B has seen 14 clicks from 100 impressions (14% CTR), then we can be more than 95% certain that Ad B’s CTR is higher than A’s. If the observed CTR of Ad B is only 3%, then we actually need nearly 4000 impressions each to be 95% certain that Ad B performs better. That’s why the difference in observed CTRs between the Friday and Saturday ad performance wouldn’t look so impressive if they only had 200 impressions between them.
As the CTR of Ad B approaches 2%, it takes staggeringly more and more data to differentiate the two ads. Trying to tell a 2.00% CTR ad from one with a CTR of 1.95% (or 2.05%) takes more than a million impressions each. And, if the two ads perform identically, with exactly a 2% CTR, obviously even an infinite amount of data couldn’t tell them apart.
Though the concepts I’ve described above are (hopefully) now very clear, unfortunately some of the web-based tools for differentiating CTRs seem to have disregarded them completely.
For example, if one ad got 1 click with a 25% CTR (that is, 4 impressions) and a second ad got 2 clicks with a 100% CTR (that is, 2 impressions), Splittester.com by Brian Teasley and Perry Marshall says: “You are approximately 99% confident that the ads will have different long term response rates.” 99% confident from just 6 impressions?! No, I’m not. If I flip a coin 4 times and get 1 “heads” and another coin 2 times and get 2 “heads,” I wouldn’t be 99% certain that either one of their per-flip chances deviate from 50% at all.
Supersplittester.com, a similar site by Dr. Glenn Livingston (I presume), has similar deficiencies. For the case of Ad A, with 4 impressions, 1 click (25% CTR) and 1 conversion (100% CR), and Ad B with 2 impressions, 2 clicks (100% CTR) and 1 conversion (50% CR), the site tells me both that “Ad B has a higher CTR than ad A (99% Confidence Level)” and that “Ad A has a higher conversion rate than ad B (80% Confidence Level).”
Frankly, the only thing I have 99% confidence about is that Teasley, Marshall and Livingston should have a second look at their computer code to see what’s going wrong.
In the McKinsey Quarterly, Google’s chief economist Dr. Hal Varian said: “I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?”.
He’s absolutely right. In 1990, only a handful of geeks knew what a “homepage” or an “email” were. Ten years later, few people didn’t know. Likewise for search marketing, even basic concepts like determining a confidence interval to identify statistical significance can still seem esoteric. But the industry is quickly realizing that being able to do these calculations is not just for geeks anymore.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.