Googlebot Makes An Appearance In Web Analytics Reports

A few days days ago, I noticed some strange Google Analytics data: Googlebot appeared as a browser in the reports. Although this might sound like a not-so-important fact when it comes to SEO, it is a major change in the Web Analytics field. As Avinash Kaushik and I wrote in the SEMJ journal article Web Analytics 2.0: Empowering Customer Centricity, an important advantage of all JavaScript based solutions (Google Analytics, Omniture, Yahoo Web Analytics…) is:

The JavaScript is not read by crawlers, which generates high amounts of traffic and are not representative of customers’ behavior. Crawlers can be excluded from the analysis; however, it is a time consuming task, and many of them are not recognizable.

To check whether this bot is really from Google, and not some kind of user agent switcher, I drilled down on the data and here is what I found.

Googlebot appears in Google Analytics reports

First of all, as we can see below, the Googlebot is recognized as a browser (version 2.1):

Googlebot Browser on Google Analytics

Second, when we drill down to the network location report we find the following:

Googlebot Network properties

How does it affect the data?

If we look at the behavior of this bot, we see a very low time on site, very low pages/visit, and very high percentage of new visits. This might be due to the fact that the bot does not fetch cookies, which is essential to accurate analytics tracking. Below are some numbers:

Googlebot Behavior

Statistically speaking, this means that the Googlebot is an outlier, which is a data point that lies outside of the overall pattern of a distribution. It means that it can distort the numbers. In the example above, just a few visits with very low time on site and percentage of new visits can significantly decrease the overall average time on site andpercentage of new visitors, which is clearly bad for someone looking at the overall behavior of visitors.

How to exclude Googlebots from your Google Analytics data

Here is a filter that can be applied to Google Analytics profiles to exclude this Googlebot from messing with your data.

Exclude Googlebot Filter on Google Analytics

What lies ahead?

Google has been officialy scanning JavaScript since 2008. So maybe this has been a low priority or low usage technique untill now, used only in very specific cases. But recently we have seen an increase in this practice, so the big question is whether this is a trend that will increase as time passes or is it just a few specific tests run by Google? Editor’s note: Google declined to comment when asked for more information.

For now, we can only hope that this kind of data is not being collected by analytics packages from the back door. If it has been this might have been skewing the data quite a bit given Googlebot’s low time on site and percentage of new visits stats.

Disclosure: The data used on the screenshots above was extracted from the Web Analytics Association website. If you would like to take a look at this data, it is currently available to all members as part of the Web Analytics Championship.

Postscript: Google Analytics posted a response in the comments:

“The official Google bot does not execute Google Analytics JavaScript. We’re not sure what it is exactly, it could be anyone’s bot, some intern’s experiment, or other such traffic.”

I agree with this comment in that the official Googlebot reads JavaScript but does not execute it. Besides, it does not store and send cookies, which means that Paves/Visit would be exactly 1 and time on site exactly 0. Lastly, If the officiall Googlebot did execute JavaScript, we would have seen massive ammounts of visits.

It is also important to note that although we used Google Analytics as an example, we mean all JavaScript based solutions, including Omniture, Yahoo Web Analytics, WebTrends and others.

Please note that this issue requires additional investigation both in regards to Google Analytics and to how Google Search uses the Googlebot.

Related Topics: Channel: Analytics | Features: Analysis | Google: Analytics

Sponsored


About The Author: is the Founder of Conversion Journey, a Google Analytics Certified Partner. He is also the founder of Online Behavior, a Marketing Measurement & Optimization website. You can follow him on Google+ or Twitter.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://andrescholten.nl André Scholten

    “This might be due to the fact that the bot does not fetch cookies, which is essential to accurate analytics tracking”

    If it doesn’t fetch cookies the tracking isn’t fired at all. You need to have a cookies enabled bot to trigger the GA tracking.

    Some sources at Google deny this is the real Googlebot, but why does the bot appear in a lot of different GA accounts? Strange.

  • Dudibob

    I see what you mean, got 3 visits from the ‘browser’ on the 12th July and only then… site gets a lot of traffic, no sign of the Googlebot browser on lesser sites though.

  • http://www.antezeta.com/blog/ sean

    I am also seeing the same behavior, albeit rarely.

    I first searched for the network location “google inc.” and then listed browsers for that location. I also checked the raw log data which showed useragent “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” from crawl-66-249-68-176.googlebot.com triggering Google Analytics [I use _setLocalRemoteServerMode() to send GA data to a log file].

    It would be nice if Google would enlighten us on what is happening here and what we should expect in the future.

    Google has crawled JavaScript, CSS and Flash for years (their 2008 post didn’t announce this, it only confirmed what was already known [http://www.antezeta.com/blog/google-css-javascript]) . I imagine the crawling allows discovery of readable text and links as well as the detection of “search engine spam”.

    Crawling (and analyzing), however, is not the same as the actual execution of JavaScript, required to trigger JavaScript based Web Analytics reporting. Google’s prior adventure with the much maligned Google Web Accelerator [http://www.antezeta.com/blog/google-web-accelerator] leads me to think they’d be very cautious before actually executing JavaScript on any sort of wide basis due to the damage which could be done.

  • google.analytics

    The official Google bot does not execute Google Analytics javascript. We’re not sure what it is exactly, it could be anyone’s bot, some intern’s experiment, or other such traffic.

  • http://www.brickmarketing.com nickstamoulis

    I am glad I am not the only one who saw this recently…

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide