How do you convince a bunch of publishers to buy ads on your service? If you’re Google, how about withholding data from those publishers about how people are finding their sites, as a way to drive them into your ad system? I don’t think Google planned for this to happen. But that’s the end result, in what could be called the “Not Provided” scheme.
“Dark Google” Nearly Two Years Old
In October 2011 — nearly two years ago now — “Dark Google” was born. Google began holding back information it previously gave to publishers for free, “referrer data” that let publishers understand the exact terms people used when they searched on Google and then clicked to publisher sites.
Did your site get found on Google for when someone searched for things like “iphone cases,” “reasons why the US might attack Syria” or “help for a bullied child?” In the past, you could tell this. Google provided that data using an industry-standard mechanism that analytics tools could easily tap into.
Originally, people were finding a low percentage of terms were being withheld, or showing up as “not provided,” to use the term that Google Analytics shows, in these cases. But over time, the percentage for many sites has risen, making some wonder if eventually, we’ll be living in a 100% not provided world.
Withheld For Privacy, Despite Deliberate Loopholes
Why did Google break a system that had been in existence even before Google itself? Google said it was done to better protect the privacy of searchers. People might search for sensitive information, so by withholding search terms, Google felt it was preventing any eavesdropping or leakage that might happen.
It’s a good reason, one I agree with in general. But it was also a flawed move, because Google still allows sensitive search terms to potentially leak in three different ways. These are:
1) Search terms that get suggested by Google Instant autocomplete
2) Search terms that Google provides to publishers through its Google Webmaster Central service
3) Search terms that Google continues to transmit across the open web to its advertisers
The latter loophole is especially noteworthy. Google expressly went out of the way to ensure that its advertisers could still receive referrer information the traditional way, with no need to log-in to some closed Google system.
When Google first shifted to this new system, I wrote that the third loophole was putting a price on privacy. Google appeared willing to protect privacy up to the point where it got too pricey for itself. Having a bunch of irritated, angry advertisers might have proved too expensive.
Historical Data Not Archived
Google’s biggest defense against such accusations, that this was all done to increase ad revenues, has been the second loophole on the list. Publishers can use Google Webmaster Central, log-in and see the terms driving traffic to their sites, all for free.
There are caveats, however. You’ll only see the last 90 days worth of data, and only for the top 2,000 queries for any particular day that are sending traffic to your site. The number of terms used to be smaller, but Google expanded this early 2012. Because the exact terms change each day, potentially, publishers can see many thousands of different queries.
Personally, I think the “depth” of queries is great. For many sites, seeing only the top 100 or 200 queries sending them traffic might encompass a huge chunk of their visitors. But the historical data for many sites has been lost, and continues to be, because of that 90 day window.
Want to know how your top terms compare today to a year ago, or what traffic from those terms is like? You can’t do that in Google Webmaster Central, because Google won’t store it for longer.
I’ve repeatedly asked Google why it doesn’t expand the period of time that this data is retained. After all, it was more than happy to store that data when it was transmitted the old way, for anyone who wanted to capture it via Google Analytics.
Google’s usual response is to point to a Python script it created, for those who want to download the data programmatically. That’s a bad solution for many publishers, in my opinion. It’s like telling them they can only use Google Analytics if they set-up a routine that will automatically forward their server logs each day. It’s not easy. It’s not how Google usually aims to serve its users, of which publishers are a key constituency.
The other answer has repeatedly been the standard Google “we’ll consider it” type of response. Two years in, what else needs to be considered? Clearly, the inaction on expanding the time period shows it’s not a Google priority.
Now: Unlimited, Easy Archiving — For AdWords Accounts
Things changed dramatically at the end of last month. Quietly, Google announced a new “Paid & Organic” report for those with AdWords accounts.
Want to store those search terms Google’s been withholding and dropping out of Google Webmaster Central after 90 days? Just sign-up for AdWords. Allow it to link to your Google Webmaster Central account. It’ll start pulling the search term data out of there constantly — no Python script required.
Want to know your top terms, after doing this? Select “All Online Campaigns,” and make an empty campaign, if you don’t already have one. Then go to the “Dimensions” tab, change “View” to “Paid & organic,” and there are all your stats. You’ll see your top terms, sortable by clicks, queries and other ways.
The good news is that you don’t have to be a paying AdWords customer to do this. You just need an AdWords account. The bad news is that feels wrong that Google is forcing publishers into its ad interface to get information about their “non-paid” listings. It also suggests an attempt to upsell people on the idea of buying AdWords, if they aren’t already.
Planned Or Not, It’s The Wrong Signal To Send
I don’t believe things were orchestrated this way, with terms being withheld to push AdWords. I really don’t.
I think the search team that wanted to encrypt and strip referrer information had the best intentions, that it really did believe sensitive information could be in referrer data (and it can) and sought to protect this.
I think AdWords continued to transmit it because ultimately, the search team couldn’t veto that department’s decision.
But regardless of the intentions, the end result looks bad. Google instituted a system pitched as if it was protecting user privacy yet which had three major loopholes, including an explicit one for its own advertisers. Now, it has a system that further encourages people to use the AdWords system.
In the end, it makes it seem as if Google — which has a symbiotic relationship with publishers — doesn’t want keep them fully appraised of how they are being found unless it has a better chance of earning ad revenue from them. That’s a terrible message to send, but that’s the one that’s going out.
There’s one bit of good news. I asked Google for any comment about all this. I was told:
We plan to provide a year’s worth of data in the Top Search Queries report in Webmaster Tools.
There’s no set timing on when this will happen, but I’d expect it to be relatively soon. That will be welcomed. Even better would be if the data was able to be archived for as long as people want, just like they can now do if they agree to flow it into AdWords.
- Google To Begin Encrypting Searches & Outbound Clicks By Default With SSL Search
- The Death Of Web Analytics? An Ode To The Threatened Referrer
- Google’s Results Get More Personal With “Search Plus Your World”
- Google Puts A Price On Privacy
- Dark Google: One Year Since Search Terms Went “Not Provided”
- Will [Not Provided] Ever Reach 100% In Web Analytics?
- Report: Google’s Not Provided Reached 49% & Much Higher In Technology Industry
- Google Launches AdWords Paid & Organic Report: See Organic And Paid Query Data Side-By-Side