A few days days ago, I noticed some strange Google Analytics data: Googlebot appeared as a browser in the reports. Although this might sound like a not-so-important fact when it comes to SEO, it is a major change in the Web Analytics field. As Avinash Kaushik and I wrote in the SEMJ journal article Web Analytics 2.0: Empowering Customer Centricity, an important advantage of all JavaScript based solutions (Google Analytics, Omniture, Yahoo Web Analytics…) is:
The JavaScript is not read by crawlers, which generates high amounts of traffic and are not representative of customers’ behavior. Crawlers can be excluded from the analysis; however, it is a time consuming task, and many of them are not recognizable.
To check whether this bot is really from Google, and not some kind of user agent switcher, I drilled down on the data and here is what I found.
Googlebot appears in Google Analytics reports
First of all, as we can see below, the Googlebot is recognized as a browser (version 2.1):
Second, when we drill down to the network location report we find the following:
How does it affect the data?
If we look at the behavior of this bot, we see a very low time on site, very low pages/visit, and very high percentage of new visits. This might be due to the fact that the bot does not fetch cookies, which is essential to accurate analytics tracking. Below are some numbers:
Statistically speaking, this means that the Googlebot is an outlier, which is a data point that lies outside of the overall pattern of a distribution. It means that it can distort the numbers. In the example above, just a few visits with very low time on site and percentage of new visits can significantly decrease the overall average time on site andpercentage of new visitors, which is clearly bad for someone looking at the overall behavior of visitors.
How to exclude Googlebots from your Google Analytics data
Here is a filter that can be applied to Google Analytics profiles to exclude this Googlebot from messing with your data.
What lies ahead?
Google has been officialy scanning JavaScript since 2008. So maybe this has been a low priority or low usage technique untill now, used only in very specific cases. But recently we have seen an increase in this practice, so the big question is whether this is a trend that will increase as time passes or is it just a few specific tests run by Google? Editor’s note: Google declined to comment when asked for more information.
For now, we can only hope that this kind of data is not being collected by analytics packages from the back door. If it has been this might have been skewing the data quite a bit given Googlebot’s low time on site and percentage of new visits stats.
Disclosure: The data used on the screenshots above was extracted from the Web Analytics Association website. If you would like to take a look at this data, it is currently available to all members as part of the Web Analytics Championship.
Postscript: Google Analytics posted a response in the comments:
“The official Google bot does not execute Google Analytics JavaScript. We’re not sure what it is exactly, it could be anyone’s bot, some intern’s experiment, or other such traffic.”
I agree with this comment in that the official Googlebot reads JavaScript but does not execute it. Besides, it does not store and send cookies, which means that Paves/Visit would be exactly 1 and time on site exactly 0. Lastly, If the officiall Googlebot did execute JavaScript, we would have seen massive ammounts of visits.
It is also important to note that although we used Google Analytics as an example, we mean all JavaScript based solutions, including Omniture, Yahoo Web Analytics, WebTrends and others.
Please note that this issue requires additional investigation both in regards to Google Analytics and to how Google Search uses the Googlebot.
Related Topics: Features: Analysis | Google: Analytics












“This might be due to the fact that the bot does not fetch cookies, which is essential to accurate analytics tracking”
If it doesn’t fetch cookies the tracking isn’t fired at all. You need to have a cookies enabled bot to trigger the GA tracking.
Some sources at Google deny this is the real Googlebot, but why does the bot appear in a lot of different GA accounts? Strange.
I see what you mean, got 3 visits from the ‘browser’ on the 12th July and only then… site gets a lot of traffic, no sign of the Googlebot browser on lesser sites though.
I am also seeing the same behavior, albeit rarely.
I first searched for the network location “google inc.” and then listed browsers for that location. I also checked the raw log data which showed useragent “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” from crawl-66-249-68-176.googlebot.com triggering Google Analytics [I use _setLocalRemoteServerMode() to send GA data to a log file].
It would be nice if Google would enlighten us on what is happening here and what we should expect in the future.
Google has crawled JavaScript, CSS and Flash for years (their 2008 post didn’t announce this, it only confirmed what was already known [http://www.antezeta.com/blog/google-css-javascript]) . I imagine the crawling allows discovery of readable text and links as well as the detection of “search engine spam”.
Crawling (and analyzing), however, is not the same as the actual execution of JavaScript, required to trigger JavaScript based Web Analytics reporting. Google’s prior adventure with the much maligned Google Web Accelerator [http://www.antezeta.com/blog/google-web-accelerator] leads me to think they’d be very cautious before actually executing JavaScript on any sort of wide basis due to the damage which could be done.
The official Google bot does not execute Google Analytics javascript. We’re not sure what it is exactly, it could be anyone’s bot, some intern’s experiment, or other such traffic.
Premium member since 06/2009
I am glad I am not the only one who saw this recently…