How To Make Your Statistics Skimpy Yet Meaningful

Statistics are like a bikini – what is revealed is interesting but what is hidden is crucial. “ – Aaron Levenstein, Associate Prof. emeritus of Business, Baruch College.

Statistics tell us about general trends and properties about our data but often they hide information that would give key insights.

The goal of every analyst then is to make statistics skimpy – but at the same time, make them as revealing as possible while not overwhelming the decision maker with unnecessary numbers.

In this post, I will focus on working the average and median statistics. While widely used to describe trends and data, they have a few limitations which can give you a wrong impression of what is going on.

Consider the revenue from three campaigns shown below:

image001

The following patterns are readily apparent:

  1. The average revenue for Campaign 1 is close to $1000 a day.
  2. For campaign 2, the average revenue appears to be increasing and it has also had a few exceptional days.
  3. Campaign 3’s performance appears to fluctuate a lot.

Taking the average and median revenue in the time period for all three campaigns, we get the following graph:

image002

The average revenue of all three campaigns is comparable. Campaign 2 appears to have a significantly higher average revenue.

However, when we look at the median revenue (i.e. the middle value of the revenue data), we see that Campaign 3 has the best performance, while Campaign 2 has the worst.

Since we have daily data, we know that Campaign 2’s apparently superior average is due to two primary reasons:

  • Performance has gradually improved with time
  • It has had a few exceptional days of performance.

Moreover, Campaign 3’s apparently superior median is due to its performance volatility. Clearly, both the mean and median statistics provide a partial picture about performance and can lead you to mistaken conclusions.

The simplest solution that comes to mind is to not provide statistics but provide time series data like the first graph.

Unfortunately, this clearly defeats the purpose of providing statistics. That is, to compress large volumes data into simple, digestible metrics that provide clear and concise information about the nature of your data.

One of the simplest ways to overcome the problem with the mean statistic is to use boxplots. I have drawn a boxplot of the 3 campaigns below.

In a box plot:

  • The left end of the left line represents the minimum value
  • The left line of the darker box represents the 25th percentile
  • The middle line separating the 2 shaded boxes represents the median
  • The right end of the lighter box represents the 75th percentile
  • The right end of the right line represents the maximum value

image003

This box plot shows:

(a)    While the median value of Campaign 2 is the lowest, there is a high degree of variance as indicated by the large box size between the 25th and the 75th percentile. The high max value hints at the days of exceptional performance (outliers).

(b)   Campaign 1 has the steadiest performance as the box width is the least of all three campaigns.

(c)    Campaign 3 has moderate volatility but has a lot of days of poor performance as the dark shaded box is larger than the lighter shaded one.

Thus, all key performance trends have been captured in this chart. An analyst reading the chart will draw the right inferences and analyze the data to confirm her hypothesis. Also note that the real estate occupied by the boxplot is the same as a bar graph. So it also achieves the goal of conciseness.

While making boxplots in Excel is not straightforward, there are various resources online to teach you how to make them relatively easily. These plots will enhance the mean statistics of your data and ensure that you will seldom miss out on key trends and insights on your data.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: Analytics | Search & Analytics

Sponsored


About The Author: is Director, Business Analytics at Adobe. He leads a global team that manages the performance of over $2 BN dollars of ad spend on search, social and display media at Adobe.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.biketoworkbarb.blogspot.com Barb Chamberlain

    Interesting post. You start out by talking about the need to put statistics into context for decision makers. But at the end, referring to the boxplot, you say, “An analyst reading the chart will draw the right inferences.”

    Yes, an analyst will–but that’s not the decision-maker audience you described initially. I wonder if the boxplot is too unfamiliar to convey meaning to many people. (Blame PowerPoint for the proliferation of bar and pie charts.)

    Another way of reading that chart at a glance–without reference to labels–is to say that Campaign 2 must be the “biggest” campaign because it occupies the most real estate on the chart.

    Bigger box size is a positive indicator in the “size does matter” world, not an indicator of variance. It also has those nice long whiskers, which must mean that it’s reaching more, right? (to the non-analyst)

    I wouldn’t show the boxplot to my boss, as it requires too much explanation. For your own insights it would work fine.

    Not an analyst with a better solution, just an Edward Tufte fan.

  • http://blog.efrontier.com sidshah

    Hi Barb,
    Thanks for your feedback. I agree with much of what you say. However…

    In my experience I have found that box plots need to be explained once but once explained are very intuitive to decision makers. Box plots have several “Tuftian” characteristics. They are compact and compress a lot of information in little real estate. They dont waste much ink either ( you dont have to shade the boxes). And they are quite intuitive once explained ( I know you disagree with me on this, but try explaining them to your boss once)

    The one BIG virtue they have over classic Tufte plots is that they are not exceptionally custom and impossible to reproduce like most of Tutfe’s famous charts ( can you program the Nepoleon chart into a computer ?). Even his bump charts are a pain to reproduce in Excel. Sparklines seem useful but in real business situations I havent found them as useful as I thought. There is far too much information compression and you often miss out on short term trends.

    I like Bill Cleveland’s ( http://www.amazon.com/Visualizing-Data-William-S-Cleveland/dp/0963488406) chart drawing philosophy more than Tufte’s because they can be easily done and scale naturally to multivariate data. Moreover they have most of the positive characteristics of Tufte’s plots. The *amazing* R plotting packages lattice and ggplot are based on Cleveland’s work.

    A final point. The average is the most commonly used statistic because it is easy to understand. But as a descriptor of data’s characteristic it is, well…. average :)

  • http://www.outsidethecurve.com Ryan Bruss

    “An analyst reading the chart will draw the right inferences and analyze the data to confirm her hypothesis.”

    I’m not so sure about that. These graphs give no information about why Campaign 2 increased performance over time or why it was the only one with a few exceptional days. If there is some property of Campaign 2 that makes it more likely than the others to have exceptional days or increased performance over time, that may change your conclusions. Also, I see no measures of statistical significance making it impossible to say anything about a hypothesis.

  • http://www.rimmkaufman.com George Michie

    Good discussion!

    This is one of the reasons we believe in the fundamental importance of smart analysts having access to and the ability to manipulate raw data. Canned reports limit the number of ways you can look at data.

    The box charts are really cool, but I agree with Barb that some folks will not grok them. While I love Tufte’s work, I agree with Sid, it’s just impossible for us mortals to reproduce that stuff. Great if you have a team of graphic artists at your disposal, but something easier to produce in excel or R is the most useful, even if it’s less visually appealing.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide