Google made a significant event out of Leap Day 2012 by releasing a new version of Google Analytics featuring Data Sampling. The idea behind data sampling is commonplace in any statistical analysis: in order to get results faster, you analyze a sub-set of data to identify trends and extrapolate aggregate results based on the percentage of overall traffic represented in the sub-set.

While I’m not a huge fan of sampling data when not necessary, larger data sets put a significant load on servers and sampling becomes a necessary evil when trying to deliver quick high on high volume data sets. As a result, I’m a fan of how the GA team has integrated data sampling into reporting.

On the custom reporting tab there is a new button resembling a checkerboard. Below the button is the sample size.

To adjust the sample size, click on the checkerboard button to populate a sliding scale going from “Faster Processing” on the left to “Higher Precision” on the right.

Faster processing uses a smaller sample size, delivering results more quickly. Higher precision uses a larger sample size for more accurate reporting.

As with any data sampling process, the smaller the sample size the greater the margin of error due to the assumption that the sub-set of data reflects the trends of the aggregate data set.

This is a significant change in how we can read into data sets as 1) it gives analysts a mechanism for more real time insights as the data aggregation takes several hours before being made available in the interface; and 2) the integrity of data can be put into question due to statistically insignificant sample sizes.

When Does Sampling Occur?

As noted in the “Learn More” link under the sliding scale, sampling automatically occurs ”By default (until you use the slider to change your sampling preference)…when report data exceeds 250,000 visits. However, you can use the slider to increase this threshold to as high as 500,000 visits…Sampling in Multi-Channel Funnel reports automatically occurs when the data includes more than 1 million conversion paths, regardless of your sampling preference setting.”

How Will This Affect My Data?

In short, you will lose clarity and there is the potential for misleading insight if the sample size is too small. If you have an account and run reports with more than 500,000 visits, your data sets will be truncated and assumptions made be made.

Is this cause for alarm? Yes and no. So long as Google delivers statistically significant data samples, then there is no cause for alarm. However, this feature is new and it’s unclear how statistically significant the data samples are. If you require no sampling you’ll need to run reports with less than 500,000 visits.

As noted above, one exception to the 500,000 visit limit is in multi-channel funnel reporting. Multi-channel funnels will be sampled once the number of paths exceed 1,000,000.

To be frank, 1,000,000 conversion paths is a lot of conversions and there aren’t that many companies out there who are pulling in more than 1,000,000 conversions in a relevant time frame (given seasonality; if you are one of these companies, I suggest the paid version of Google Analytics or another solution to mitigate the issue).

As a result, I don’t expect the relevancy of multi-channel conversion funnel reporting to be impacted by data sampling.

If anyone has already analyzed the effects of data sampling on the relevance of their reporting please feel free to comment below. I will be following up once I have enough relevant data to share.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: Analytics | Search & Analytics

Sponsored


About The Author: is in charge of client strategy at Fusion Tree, a performance marketing and analytics consulting group, and is based in the San Francisco bay area.

Connect with the author via: Email | Twitter | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.analyticspros.com/ Caleb Whitmore

    Benny,

    Great article – I’m glad you’re raising attention to this new release in GA.

    Sampling is, however, not new to GA at all. It’s been around for years. What’s new here is the ability to *control* sample size from 1,000 visits to 500,000 visits.

    Sampling doesn’t generally impact top-line metrics at all – the algorithm seems to be quite accurate for main numbers like visits, total pageviews, etc…

    Where sampling goes south is with granular data. We’ve long seen this in GA reports when looking at something granular. Let’s say you have a site with 10 million visits a month and you’re looking at 6 months of data. You go to the Organic Keywords report and filter for branded terms, then apply a Secondary Dimension for Landing Page, so you see keyword+landing page combinations. At this granular level, you’re dealing with a 500,000 visit sample out of 60 million visits over 6 months. You’re GOING to see weird things happen, like “63 visits from keyword A + landingpage.html” – drilling to that combo you’ll see weirdness such as 6 days over the 6 months that each had exactly 63 visits. What?? Sampling.

    So, just watch out for the granular data. If you’re a big site and need granular data, then you’re going to have to consider GA Premium to get past the sampling issues. For most, though, it’s not that big of a deal. With 500k visits and a short date range, even sites with millions of visits can still be analyzed easily enough.

    Best,

    -Caleb Whitmore

  • http://www.esearchvision.com Benny Blum

    Thanks for the insight, Caleb. To your point, I just ran a top line analysis for a site which gets a couple hundred thousand visits per day on several different sample sizes for page level data and the result were pretty wonky. A 1000 visit sample told me that I had 184,000 total visits on 37 pages while both the 250,000 and 500,000 samples told me I had 104,000 visits – a 76% variance! As I adjusted the bar towards ‘higher precision’ things started to normalize once I had about 50,000 visits in the sample. In this case, 50,000 visits represented just over 2% of data.

    The takeaway remains the same: be wary of sample size. A larger sample always yields more accurate data.

  • http://www.danmozgai.com/ Dan Mozgai

    I thought they were coming out with a premium version. If they do that, hopefully we can get both speed and acuracy.

  • http://www.facebook.com/people/Robert-Miller/1811341397 Robert Miller

    Hey Benny, nice post on the new feature of allowing users to adjust their sampling scale. However, GA, like pretty much every other web analytics platform, has always sampled analytics data once the account’s visit threshold has been met. Now they are just giving users the ability to customize how much their data is sampled.

  • Galoredk

    There are analytics platforms out there that does not sample data. Webtrends is one of them. You can look at 10 billion visits and still get the accurate number.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide