Analytics news and expert advice every Thursday.
Deduplicate Your Google Analytics Reports in Excel
One of the steps in my last post on finding 404 pages worth saving involved determining if any of your404 pages received traffic in the past year. This can be accomplished by pulling a landing page report and using VLOOKUPs in Excel to see any of your broken pages used to receive traffic. However, Google Analytics (GA) content reports are case sensitive, so you might have multiple versions of what should be a single line item because of this.
For example, /default.aspx and /Default.aspx are considered two different pages in Google Analytics. If you’re used to exporting your content or landing page reports without taking this extra step, you’re probably reporting inaccurate data.
Deduplicating With Pivot Tables
A simple solution to this would be to create a pivot table. This will automatically merge your data — and isn’t case sensitive.
However, that won’t help you in a situation where you need to do VLOOKUPs to stitch your data together; so I’m going to tell you about a cool tool in Excel that few seem to know about but is very useful. It’s the Consolidate command, and you can find it under the Data tab.
Common Use For Consolidate Command
The most common use of the Consolidate command is to merge datasets from different worksheets or even workbooks (i.e., completely different files). So let’s say you have a reporting dashboard that has a separate worksheet for each month of data. You could use the Consolidate command to combine all of the monthly datasets into one aggregated dataset. You can even create links so that the consolidated dataset is updated whenever its individual elements are.
However, it also very useful in deduplicating datasets without losing any of the data. If you just have a list of, let’s say, keywords you’re bidding on, and you want to deduplicate that list, you’d use the Remove Duplicates command. This command is also found under the Data tab. But if that list also includes traffic and revenue metrics, you do not want to merely dedupe the list; you want to merge those values. See the difference?
Consolidate Landing Page Reports
What I’m going to use in this demonstration is a list of landing pages from Google Analytics that includes visit and revenue data. You can see below that we have two instances of duplicated data. The second landing-page01 was actually the result of removing all query parameters from URLs. (Then I, of course, anonymized the data.) If your data is littered with unnecessary query parameters (the values that follow a question mark in a URL), you can use Text to Columns to remove everything after the “?” symbol. (However, you should really be filtering out any query parameters you don’t need to report on in your profile settings.)
Step 1: Copy your column headings over to where your new, deduplicated dataset will ultimately live. Then put your mouse cursor where the dataset will begin.
Step 2: Choose Data > Data Tools > Consolidate, which will bring up the Consolidate menu.
Step 3: Since we’re going to want to add the visits and revenues for the duplicate rows, we’ll choose Sum from the Function drop-down menu.
Step 4: For the Reference, click-and-drag over your dataset. Or, if you’re working with a formatted table (which you should always be), just select any cell inside your dataset and press Ctrl-A (Mac: Command-A) to select all of the cells. Lastly, you could use a named range and reference that.
Step 5: Since we’re only dealing with one dataset, we won’t need to add more references. But if you were merging multiple datasets, you would select the Add button to add them.
Step 6: Since the values we’re using to deduplicate the list with are in the left column, we’re going to use that option under “Use labels in.” Here’s what the dialog should look like when you’re finished. Press OK to let Excel do its magic.
Step 7: And voila! Your consolidated data.
I use this command most frequently to deduplicate content reports in GA. If your URLs don’t automatically redirect to lowercase, or you don’t have a filter in place in GA to force URLs to lowercase, it’s really important that you take this extra step to make sure you’re presenting accurate data.
Some opinions expressed in this article may be those of a guest author and not necessarily Search Engine Land. Staff authors are listed here.