Messy, incomprehensible analytics make my stomach churn. Just knowing that I’m going to spend the next several hours cleaning up sloppy data puts the kibosh on my day.
The problem with Google Analytics, or any analytics package for that matter, is that even if my site is properly tagged and I’ve developed a systematic inbound link tagging notation, it’s almost certain that my Source/Medium/Referral URL reports will still include inconsistencies that need to be resolved to provide accurate reporting:
Software doesn’t know what isn’t logically defined. Google does a decent job of auto-tagging referring traffic and categorizing into Source/Medium buckets which are logical.
However, a bunch of stuff slips through the cracks into the catch-all “Referral” bucket. Over time, the bucket grows big enough to constitute a very significant percentage of overall traffic and needs to be addressed, cleaned up, and recategorized in order to provide accurate Source/Medium reporting.
A classic example in the table above is bingiton.com. It’s unclear to Google Analytics as to what the site is, so it’s tagged as a referral.
However, by visiting the site, we can easily see that it’s a site comparing Google and Bing organic results and as a result, the source can either be Bing or Google (although the goal of the site is to show Bing organic results are better than Google’s), and Medium should be Organic, as the results are clearly coming from a search engine, as shown below.The reality is that Google Analytics will never be perfect at tagging inbound traffic. Enforcing partners to adhere to a clean inbound link format can prevent part of this phenomenon, but at some point it can and will get totally out of our control. That said, GA reports are a perfect starting point for your in-house analytics team to work with for producing a truly accurate performance report.
The good news is that there is a solution; and furthermore, it’s not too difficult to put in place. At my company, we call it Clean Source Tagging. The idea is that there’s a raw source and a clean source. We take what GA gives us (the raw source), manually review it for accuracy, and compile an Excel database of all traffic sources and a reviewed and approved list of true clean sources.
If you don’t recognize a source, visit the site to determine what it actually is. We can even get a little fancy with the output and include primary source, secondary source, and medium to gather all relevant information:
We then take the standard GA Source/Medium output and run a simple vlookup against the clean source table to associate each referral with the correct source/medium.
If it sounds like a lot of work, it’s because it is. But it’s a one-time project with a small ongoing component to keep the database fresh and up to date. The bulk of referring URLs are static week-over-week.
As a result, the initial project may take a while; but if you keep the database handy, run the vlookup and address the new unknown referrals, it’s a quick job even an intern can handle in under an hour per week.
So, now that we’ve got a clean source table, what’s next? Dashboards. The clean source table gives us the ability to create the simple, intuitive and insightful dashboards executives crave:
With a little extra work up front, we can create a beautiful Excel template, drop new data in, update the clean source table, and deliver source performance reports in no time. What was previously an irrelevant and unusable report in GA, or a mountain of work to get right, becomes a near automated solution that is a staple of weekly reporting.
Now you can hit snooze one more time on Monday morning. You earned it.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.