How Scattergraphs Can Be Your Best Friends
Recently, I was on an in-house SEO panel at SMX with REI’s Jonathon Colman. Most of the audience’s questions centered around explaining and reporting relevant metrics to upper management.
Turns out, while search has come a long way, many execs still use terms like “Google Juice” and define success as launching a PPC campaign to “rank number 1 for our competitor’s name”. This issue is even more pronounced in larger, established companies where search makes up a smaller portion of the marketing mix.
Jonathon’s primary recommendation centered around “data visualization” – explaining and reporting on search concepts (and progress) through pictures instead of technical jargon and theory.
To the extent that you can translate your SEO efforts into picturebooks for MBAs via powerpoint, you can successfully focus those with limited search understanding on the correct tactics.
What we are all really trying to do is develop a clear understanding of “if I do X, then Y is going to happen”.
In mathematical terms, this is called a correlation coefficient – i.e. the extent to which two series of datapoints are interrelated. Correlation coefficients range from +1 (perfect positive correlation) to -1 (perfect negative correlation).
This can get infinitely more complex when you add more than two datapoints, the analysis is a statistical methodology called multiple regression analysis in which you try to determine the extent to which multiple data points impact a variable.
This is the process undertaken by some search consultancies and tool providers who try to use data to backdoor their way into search engine algorithms. Multiple regression analysis is a hairy process, involves words like heteroscedasticity and requires either an advanced degree in statistics or econometrics to do with any degree of accuracy. I stay away.
One note of caution: correlation does not mean causation. Just because the two datapoints have a similar pattern, doesn’t mean one influences the other. An obvious example of this is sunrise and eating breakfast . . . while these things often happen in synch, eating your Cheerios at 4 am will not make the sun rise any earlier.
Simple regression, in which we are just looking at fit between two data points is, in fact, pretty easy stuff. The concept is fairly simple – calculate a straight line that best fits two data points when plotted on a graph. If you want to geek out on the math behind this try the Simple Linear Regression page on this awesome site I just found called Wikipedia.
Here’s a visual explanation of correlation coefficients and simple regression:
(Obviously, this is not my graphic – do you think I’d deliberately highlight a negative correlation between hair and time?)
If you’d actually like to do it instead of remembering the greek symbols behind math formulas . . . use good old Excel. Here’s how:
1. Select Two Datapoints
While you can calculate correlate between all sorts of things, may I suggest starting with inbound natural search traffic and some variable that theoretically impacts that?
To get multiple datapoints, you’ll need to segment your data – in the case of Urbanspoon, it’s pretty easy – we can look at traffic by city, cuisine type, or entry categories (restaurant pages instead of city pages for example).
Now, normalize that data: if you are looking at differences by geography, calculate penetration by dividing your entry sessions by population; if you are looking at differences by product category, calculate penetration by dividing by overall search impressions. (Depending on your data sources, this normalization process can be persnickety and tricky.)
2. Open Excel
Put your two datapoints into two excel columns.
3. Correlation Coefficient
Calculate the correlation coefficient between the two columns using the CORELL command. This will give you the mathematical correlation coefficient indicating the extent to which those two datapoints are correlated – the closer to 1, the more tight positive correlation, the closer to -1, the more tight negative correlation. Correlation coefficients close to zero indicate no correlation.
4. Turn this Number Into a Picture
Use excel to create a scattergraph of these two columns like the ones above. I like to put the natural search penetration on the vertical axis and the tactical variable on the horizontal axis. Assuming there is a correlation . . .
5. Impact the Variable
Engage in whatever tactic are analyzing by selecting a few of the datapoints that are underperforming (i.e. for positive correlation, these datapoints will exist in the bottom left hand quadrant of your scattergraph.) This tactic can be linkbuilding or social mentions for example. Your goal is to move the datapoint along the horizontal axis and see if it also moves up the vertical (penetration) axis.
How long you wait depends on what tactic you are using and how quickly (theoretically) you think it’s going to take for the tactic to have an impact.
7. Redraw the Scattergraph
Now, after you have a new set of data, redraw your scattergraph. Highlight those variables in a before and after comparison of the scattergraphs and demonstrate to your MBAs the extent to which movement along the horizontal axis is reflected in movement up the vertical axis. Highlight this movement with arrows or different colors for your test datapoints. Y
ou can even redraw both data grabs using different colors on the same graph, or show a simple before and after.
8. Declare Success or Failure of Tactic
The result being to roll out your effort more broadly or abandon the tactic altogether.
This gives you a real way of calculating the impact of your tactics. If you have cost metrics (and you should), you can transcend discussion of GoogleJuice (yummy, I like mine on ice) and make ROI driven investments in search.
Some opinions expressed in this article may be those of a guest author and not necessarily Search Engine Land. Staff authors are listed here.
(Some images used under license from Shutterstock.com.)
Analytics news and expert advice every Thursday.