• http://www.analyticspros.com/ Caleb Whitmore


    Great article – I’m glad you’re raising attention to this new release in GA.

    Sampling is, however, not new to GA at all. It’s been around for years. What’s new here is the ability to *control* sample size from 1,000 visits to 500,000 visits.

    Sampling doesn’t generally impact top-line metrics at all – the algorithm seems to be quite accurate for main numbers like visits, total pageviews, etc…

    Where sampling goes south is with granular data. We’ve long seen this in GA reports when looking at something granular. Let’s say you have a site with 10 million visits a month and you’re looking at 6 months of data. You go to the Organic Keywords report and filter for branded terms, then apply a Secondary Dimension for Landing Page, so you see keyword+landing page combinations. At this granular level, you’re dealing with a 500,000 visit sample out of 60 million visits over 6 months. You’re GOING to see weird things happen, like “63 visits from keyword A + landingpage.html” – drilling to that combo you’ll see weirdness such as 6 days over the 6 months that each had exactly 63 visits. What?? Sampling.

    So, just watch out for the granular data. If you’re a big site and need granular data, then you’re going to have to consider GA Premium to get past the sampling issues. For most, though, it’s not that big of a deal. With 500k visits and a short date range, even sites with millions of visits can still be analyzed easily enough.


    -Caleb Whitmore

  • http://www.esearchvision.com Benny Blum

    Thanks for the insight, Caleb. To your point, I just ran a top line analysis for a site which gets a couple hundred thousand visits per day on several different sample sizes for page level data and the result were pretty wonky. A 1000 visit sample told me that I had 184,000 total visits on 37 pages while both the 250,000 and 500,000 samples told me I had 104,000 visits – a 76% variance! As I adjusted the bar towards ‘higher precision’ things started to normalize once I had about 50,000 visits in the sample. In this case, 50,000 visits represented just over 2% of data.

    The takeaway remains the same: be wary of sample size. A larger sample always yields more accurate data.

  • http://www.danmozgai.com/ Dan Mozgai

    I thought they were coming out with a premium version. If they do that, hopefully we can get both speed and acuracy.

  • http://www.facebook.com/people/Robert-Miller/1811341397 Robert Miller

    Hey Benny, nice post on the new feature of allowing users to adjust their sampling scale. However, GA, like pretty much every other web analytics platform, has always sampled analytics data once the account’s visit threshold has been met. Now they are just giving users the ability to customize how much their data is sampled.

  • Galoredk

    There are analytics platforms out there that does not sample data. Webtrends is one of them. You can look at 10 billion visits and still get the accurate number.