Is Twitter Sending You 500% To 1600% More Traffic Than You Might Think?

Earlier I posted that Google Analytics and other JavaScript-based tracking tools might be undercounting visits from Twitter. I’ve done some more digging, which supports the case. In my test, Twitter seems to have sent 500% to 1600% more traffic than log files or hosted stats packages like Google Analytics might show.

How Twitter Might Send Far More Traffic Than You Think is my earlier article that explains how I’d often seen big gaps in how many people apparently clicked on a tweeted link as measured by Bit.ly versus how many page views that Google Analytics was showing.

To test this further, I tweeted a particular page on my personal blog along with tracking code designed to especially help ensure it appeared in Google Analytics. I’m going to toss out a bunch of numbers as part of this analysis. If they get confusing, skip to the end for the conclusion.

The Numbers Bit

For July 7, Bit.ly reported that the page had registered 58 clicks. Were there 58 corresponding page views? No. Google Analytics only reported 17 page views from 11 unique users. That meant a gap of 41 views.

Was the gap due to clicks from non-human robots that don’t process Google Analytics JavaScript tracking code? Visits from people using mobile browsers that didn’t get tracked, because they might not process the code? To explore further, I went to the raw log files, the records that the server itself keeps. These shows any request made for the page, regardless of any JavaScript issues.

I found that there had been 57 total requests — practically the same as Bit.ly reported. However, 14 of these were for the page without the tracking codes I’d used when tweeting the page through Bit.ly.

In other words, this is the URL I put out through Bit.ly, which was reported to receive 58 visits:

http:///mothers-cookies-closes-the-sadness-for-products-i-no-longer-have-389?utm_source=twitter&utm_medium=micro-blog&utm_campaign=twitter

See the part in bold? Those are tracking codes or parameters. From the log files, I found that URL above (with the codes) had 43 visits (not 58) and the same page without tracking codes like this received 14 further visits:

http:///mothers-cookies-closes-the-sadness-for-products-i-no-longer-have-389

Those 14 visits without tracking codes almost all came from robots (Google: 5; Yanga: 4; Microsoft: 3). Two other visits seemed to be from humans. These robotic visits all likely had nothing to do with my tweet. The requests were from spiders doing their regular crawls of the web, it seems. The few human visits to the page without tracking codes were probably people who came to my article for reasons unconnected with the tweet.

What about those other 43 visits to the page that did have the tracking code? Well, 11 visits were from what appeared to be robots (OneRiot: 1; PycURL: 2; Ginxbot: 2; WebShot: 1; Google: 2; Tweetmeme: 1; Python-urllib: 1; LongURL API: 1).

That left 32 visits that appear to be from humans. That’s almost between the 58 views Bit.ly reported and the 17 page views Google Analytics reflected. Why still such a gap with GA?

One leading argument has been that some Twitter applications on mobile devices load pages within the application, rather than using an external browser, and so aren’t getting registered by Google Analytics. Also, some mobile browsers might not process JavaScript. I could see at least four iPhone-based requests like this. But there were plenty of other requests that appear to be from full-fledged desktop-based browsers. Why weren’t they showing up?

One clue is that of the 34 requests, only 5 of them contained “referrer” data, information that some browsers pass on that indicate how they found the page in the first place. For Google Analytics (or ANY analytics program) to properly indicate how much traffic a particular site is driving, it needs as much referrer data as it can get.

Of those referrers, only 2 of them were from the twitter.com domain (1 more was from my own blog’s domain, 1 from iconfactory.com/twitterific, probably indicating a Twitterific users, and one from powertwitter.me, probably indicating a Twitter-visit via a Firefox plug-in).

In short, based on referrer traffic alone, ANY analysis program would have reported that at best, Twitter sent my page only 2 visits. Yet, both Google Analytics and Bit.ly reported that it received far more than that.

Remember, Google Analytics said the page had received 17 views in all, 11 from unique users. How many of those 11 unique users came to the page via Twitter? Google Analytics said 9. One more came directly, it said; another person did a search to find it (mother’s cookie site:daggle.com was the search, which was me locating the article. Oddly, this request does NOT appear in the raw log files).

The Big Conclusion

All those earlier numbers hurt your head? Here are the most meaningful ones. Thanks for hanging in there!

Based only on referrers, at best, Google or any analytics program would have said Twitter sent 2 visits. But because I used tracking codes, I was able to overcome the lack of referring data and see that Twitter (itself or via applications or web sites using Twitter data) sent 9 visits. That means analytics packages might be undercounting Twitter visits by nearly 500%.

Meanwhile, Bit.ly was showing those 58 clicks to the page. Let’s say it wasn’t filtering out some of the robots. I can still see that there are 32 visits that the log files recorded, all with the tracking codes that never existed until I tweeted the link with them. So those are all Twitter-derived visits. That means an undercount by a standard analytics tool depending on referrer data by 1600%.

And The Analytics Companies Say?

I sent my logs to both Bit.ly and Google, along with a draft of this article, for any reaction.

Google said they’re aware that activity on mobile devices can cause issues with tracking and that they’re looking for ways to improve their product.

Bit.ly said they filter out robotic clicks such as Ginxbot, Google, and Python-urllib, through PycURL. When I asked further about the gap, they emailed back:

It looks like three types of events make up the delta.

First, browser plug-ins and automated url-lengthener applications, which make requests to the bit.ly URL, but don’t follow the redirect to the destination site.

One example is the “eventBox” at http://thecosmicmachine.com. Here’s how it appears in the logs:

(eventBox) : – - [07/Jul/2009:20:41:31 -0400] “GET /cHXSP HTTP/1.1″ 301 410 “-” “EventBox567 CFNetwork/438.12 Darwin/9.7.0 (i386) (iMac9%2C1)” 301

Second, small bots that make their way through our screening system:

(slicehost): – - [07/Jul/2009:21:05:43 -0400] “GET /cHXSP HTTP/1.1″ 301 410 “-” “-” 301

Third, browsers which don’t support JavaScript, as well as browsers with JavaScript settings turned off and browsers running JavaScript-blocking extensions like noscript.

And Some Related Reading

Last week, Fred Wilson posted Does This Blog Get More Traffic From Google or Twitter?, finding that for his personal blog, Twitter traffic has risen past Google search traffic. Fred suspected that the Twitter traffic was even more than being shown, due to undercounting. I think he’s right. While I think Google search traffic still remains a major traffic driver for many sites, those who have lots of Twitter followers or have a story go “hot” through retweets certainly may discover Twitter is a new major traffic resource –and one that’s likely undercounted.

Over at the Zebu Blog, Link Tracking – (lies, damn lies &) Statistics? also looks at the issue, questioning whether Bit.ly is overcounting. In a follow up comment, Mayank Sharma did his own small scale experiment and found:

We created a bit.ly url for this post, and posted it on Twitter. The next instant we saw, that bit.ly’s count was already 4. This only means that some twitter crawler/indexer received the tweet and de-referenced the url mentioned in it. After that I hovered my mouse over the link shown in Twitterfox. Sure enough bit.ly’s count increased by one. We did this repeatedly from multiple desktop’s of several friends and the count just kept on increasing. Not one of these folks during this time had actually clicked on the link.

I agree — Bit.ly seems to be overestimating views. But Google Analytics seems to be underestimating them, perhaps severely based on my small scale log analysis program. Using tracking codes occasionally is one way to get a reality check.

Finally, if you want to add tracking parameters for URLs you tweet, consider the Snip-n-Tag add-on for Firefox. I’ve been using it, and it makes adding these to URLs super easy.

Related Topics: Channel: Analytics | Features: Analysis | SEM Tools: URL Shorteners | SEM Tools: Web Analytics | Top News | Twitter

Sponsored


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.urbanbacon.com Phil Novara

    This is great content…once again uncovering the hidden power of Twitter. Google should have analytics updated to accommodate this “hidden traffic” in months to come. I mean, they are Google…

  • http://www.savio.no Eivind Savio

    Great article Danny.
    One other possible reason that may lead to underestimating in Google Analytics as well, is the quality of the traffic from Twitter. If the visitor bounces right back without giving the page enough time to load properly, Google Analytics (and other script based analytics program) may not have enough time to record the visitor. The Google Analytics script is normally placed just before , and this means that the page needs to load before the tracking script is activated.

    I have written a bookmark solution earlier that does the same as Snip-n-Tag, but it works in most browsers:
    http://www.savio.no/blogg/a/82/google-analytics-campaign-url-builder-with-short-url-and-twitter-posting
    (I hope it is OK posting the URL to my own solution)
    It currently only supports Cligs URL shortening, but I may add more shortening services if the demand is there.

  • mayank

    Great analysis Danny. That provides a good data to what all of us have known all along but did not have a convincing proof.

    There however seems to be a small mistake at the begining of the Numbers Bit primarily because of the closeness of the number 58 (bit.ly’s report) and 57 (raw log files). Since you registered the url with the tracking code at bit.ly. The number 58 from bit.ly is most certainly for the url with tracking code. But your log showed only 43 with the tracking code. So there must have been atleast 15 other visits to bit.ly for name lookup, but did not get converted to a click on your site. These would be from twitter clients and browser plugin who would do a url-lookup at bit.ly and show (in the browser/twitter client) the actual url (instead of bit.ly’s). This is most definitely another proof of how overwhelming bit.ly’s reported data is (almost 35%).

    Why can’t we have a protocol for these lookup client’s similar to what robots.txt is for search engine. We could have an optional HTTP header which identifies the request as the one coming from a bot/twitter client/browser plugin and is not expected to be counted. Another option is that since request to shortener’s are only for the link lookup, let them be a HEAD request instead of GET request. A click will definitely be a GET request, but twitter clients can use the HEAD request to only get the expanded URL. I myself use HEAD request for link lookup as it saves some bandwidth by not downloading the entire payload. But there are some URL shorteners that reject the HEAD request (don’t know why), and hence I have to use GET as a fall-back.

    It cannot become a mandatory rule as it will break existing browsers. But robots.txt used by crawlers is also an agreement between crawler and the site. crawlers are not bound to honour it but they do.

  • http://www.brickmarketing.com nickstamoulis

    Wow, this is great info to learn…I am going to start digging a bit more through our log files to try to uncover some additional data. Also, thanks for pointing out the Snip-n-Tag plugin for Firefox as well…

  • http://andrescholten.nl André Scholten

    it looks like mainly non-javascript browsers are not measured. Have you tried to add an extra Google Analytics tracking code that is generated by php, that will measure the non-javascript visits also.

    Dutch explanation about that technique is here: http://andrescholten.nl/google-analytics-zonder-javascript/

  • http://www.ericward.com Eric Ward

    I’d been noticing twitter URLs showing up in my client citation (backlink) analysis data. Since by sheer laziness I have years and years of past client link analysis data, I dumped it all into a seperate directory and searched through literally millions of links looking to see if I could spot any useful backlink analysis trends for tweeted URLs. So far, it’s interesting that twitter permalink URLs (aka individual status update pages) have yet to show up among crawled backlink data, but the main Twitter username URL does show up. In one case from this past May, Google showed 69 unique twitter accounts as having tweeted a particular URL, but none of those 69 were shown to me as twitter permalinks. All were shown as twitter.com/userame, instead of twitter.com/username/status/194816….. Since I can see Twitter does in fact index tweets, and I can also see every URL I ever tweeted, this makes me think tweeted URLs, as they age off the page and become less likely to be clicked, remain silently valueable as quality signal over time.

  • seotoys

    It does look interesting revealing the truth about twitters.. Thanks for uncovering this important fact.

  • http://www.cpcsearch.com Terry Whalen

    Eric,

    When you say tweeted URLs remain valuable as quality signal over time, are you saying that there is an SEO benefit to twitter even though the data shows no twitter permalinks? Would love any additional clarity on this. Thanks!

    Danny, thanks for the post.

  • tigertech

    >(mother’s cookie site:daggle.com was the search, which was me locating
    >the article. Oddly, this request does NOT appear in the raw log files).

    That’s most likely ’cause your computer just served you the page out of its browser cache without contacting the server… but then it ran the JavaScript on the page and thereby logged another hit over at Google.

    – Rob

  • http://borasky-research.net/smart-at-znmeb znmeb

    Well …

    * First of all, no two tools are going to count exactly the same events unless they are designed with the exact same algorithm. In short, you need to approach every vendor / tool maker in the entire analytics space and get their algorithm definition if you want to make sense of the metrics they’re reporting.

    * At the highest level, you probably want to be measuring return (in revenue dollars) per unit of effort: hours spent tweaking the search properties of the web site, time spent in Twitter conversations, etc. The intermediate stuff in the funnel probably matters somewhat, but I think what you really want to know is “should I spend the time optimizing for search, or should I spend the time tweeting, networking on LinkedIn or Facebook, etc.”

    * If the analytics can give you correct *proportions* — what fraction of your incoming unique visitors came from social media and what proportion came from search — that’s probably a lot more valuable than the absolute numbers. And if they can give you paths — do Twitter vistors hit different pages than search visitors — that’s also valuable.

    * I think rather than experiment with the things external to your site, like which tools count what, time is better spent by picking one tool, sticking with it, and doing the experimentation *on* the site. What works best in achieving the overall goals for all visitors, no matter where they came from.

  • http://blog.zebugroup.com Mayank Sharma

    Danny, I think I have come up with a simple solution to remove the counting of bots/spiders/twitter clients/browser plugins from bit.ly’s reported statistics. details are at http://bit.ly/L1ys4 . I am running that experiment on the blog right now and should know the results in a week’s time. Meanwhile would you like to run a similar experiment on your next post?

  • http://www.deskgod.com danhristov

    Danny, I measured many visitors on different social media networks, not only on Twitter. I used web measurement tool RoiWatcher from http://www.Deskgod.com and with this tool i clearly see that no one form these visitors didn’t reach my goals.

  • http://twitter.com/zumbaba zumbaba

    Hi Danny, thanks for this very interesting article though I am still scratching my head to figure out why I have a 7000% discrepancy between Bit.ly and Google Analytics…Does this happen with other URL shorteners like Tiny?

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide