• Ian Williams

    Interesting study, but 24 people is a bit low to be making sweeping statements. Just one person’s behaviour is worth over 4%!

  • Matt McGee

    Agreed, Ian. I don’t know how many people typically are typically involved in eye-tracking studies, but 24 does seem low. On the other hand, they’re measuring searches more than people here — and with each person doing 8 searches on two search engines, you do end up with a fair amount of data to consider.

  • http://www.usercentric.com/ Aga Bojko

    Ian and Matt,

    Hypothesis testing (i.e., trying to determine if there is a difference) is frequently confused with precision testing (i.e., trying to generalize an exact “score” to the population). This confusion leads to a lot of criticism regarding sample sizes used in research studies.

    We certainly do not claim that the exact numbers (% of participants who looked and gaze time) that we obtained in the study can be generalized to the population. To do that we would have to run hundreds of participants.

    Our results indicate, however, that there are three significant differences between Google and Bing at alpha level of 0.1. In other words, Bing and Google will differ along those three dimensions 9 out of 10 times. These differences are marked with asterisks on the heatmaps above.

    Being able to detect a significant difference indicates that the sample size used in the study was sufficient. An insufficient sample size usually results in an inability to detect a difference where the difference really exists rather than in detecting a difference where it doesn’t exist. If a statistically significant difference is found with a small sample size, this indicates that the difference does exist.

    I hope this helps!

  • http://uk.linkedin.com/in/jordanseo Jordan Russell

    Not a fault of the test, but I wonder if the results would be different if the searchers had ACTUAL intent to purchase. Sure we can give them a scenario but different emotions are involved if they’re genuinely looking with intent to buy.

    I’d love to see one of these tests with people in that mindset. Perhaps sit outside the Apple sotre and grab the next 10 people about to go in and buy an iPod :)

  • http://www.metricsmarketing.com MetricsMarketingRCX

    In order to be statistically significant, at least 30 users are needed for an eye-tracking study (Kara Pernice and Jakob Nielsen). However, the standard recommendation for quantitative findings like these is actually higher.

    However, the more interesting question to explore is about the recruiting strategy itself. According to the User Centric study description, “Twenty-four Internet users between the ages of 18 and 54 participated in the study. Participants conducted an average of 48 online searches per week using both Bing and Google, with at least five searches per engine.”

    However, to look specifically for users who (1) understand what a search engine is, (2) can recognize different search engines, and (3) actively switch between search engines, assumes that the target users for this study are advanced computer users. In the studies I have conducted, it is rare that even individuals who self-identify themselves as “Extremely Comfortable Online” are advanced enough to change their search engine, or to recognize when they are at another one. In June of 2009, Google found that users don’t understand what a browser is, or know the difference between that and a search engine (http://googlesystem.blogspot.com/2009/06/browser-is-search-engine.html).

    When discussing the results of this study, it is important to keep in mind that this applies to advanced computer users, and doesn’t necessarily extend to the general public.

  • glew

    The Pernice and Nielsen’s reference about 30 participants is incorrectly stated above.

    The reference can be downloaded (http://www.useit.com/eyetracking/) and it has absolutely nothing to do with statistical power. It says that “If you want to draw conclusions using heatmaps or if heatmaps are the main deliverable for the study, you need 30 users per heatmap.”

    This study analyzed actual data points, not a heatmap. Conclusions were drawn from a statistical analysis of the actual data. Not a visualization of the data as suggested by Pernice and Nielsen. Clearly, this rule of 30 does not apply.

    Those inquirying about sample size need to understand statistical power. As one of the posters commented, this is not about generalizing the data to the population. This requires large sample size and this is quite a bit larger than 30 ;-)

    The statistical test run was about difference scores. It is that the difference score is generalizable–it can be predicted to be different with error of alpha (probably .05 level).

    This is very common in experimental studies. Moreover, the fact that statistical differences were found make sample size irrelevant. Really. The point about having more sample size is that there is sufficient power to detect a difference, if it did indeed exist. This is Stats 101. If there was a difference found, then the finding is real (with error of alpha).

    Let’s talk about the results, not about the stats. The stats are clear to graduate students in stats classes….

  • http://www.usercentric.com/ Aga Bojko

    The sample size of 30 is actually a common misconception. It has been described in more detail in the article titled ‘More Than Just Eye Candy: Top Ten Misconceptions about Eye Tracking in the User Experience Field.’ The article came out in User Experience Magazine in 2010: http://bit.ly/gHAad9

  • http://www.diepbizniz.nl RobertJan van Diepen

    The talks about the sample size are important. In cooperation with the Utrecht University we researched how many participants you need to achieve a statistical significant heatmap. In this study we used advanced statistical techniques to measure this. We found out that statistical significant heatmaps can be achieved from 17 participants. When participants did a task we found out that in some cases 12 participants was enough. In general studies with a free examination needs more participants than with tasks.

  • http://sadgrove sadgrove

    70% – 80% ignored the ads?

    My last 4 weeks stats show 29,000 visits from Google CPC, and 18,000 from Google organic search. And that’s for a site which ranks well for search. Nor do I pay for top ad positions.

    My data suggests the ads work well, for my site, anyway.

  • http://www.usercentric.com/ Aga Bojko

    RobertJan van Diepen: What do you mean by a statistically significant heatmap? Are you talking about gaze patterns that can be generalized to the population or do you have hypothesis testing in mind? These are two very different concepts and require differnt sample sizes.

  • http://www.diepbizniz.nl RobertJan van Diepen

    When can you draw statistical significant conclusions from the eyemovement behaviour on a webpage. That’s the main question. Yes, these are gaze patterns. On the internet many eyetrackng heatmaps are shown and discussed. But non of us can tell if these heatmaps can be generilazed to the population. In our research we focused on that topic. How many participants do we need and can we predict that. Let’s say a heatmap is created from 20 people. Is the eyemovement behavior statistical significant or do we need more participants to draw statistical significant conclusions. In other words will this behavior differ when we ad more participants. If not 20 participants is clearly enough. So why ad more participants to the study. I hope this is more clear.

  • http://decisionstats.com Ajay Ohri

    for numbers as low as 24, T tests suffice. the data breadth is 24 times the studies given to each participant and eye tracking can generate huge amounts of data.
    an interesting result would be to revalidate with another sample after a period of time given the dynamic time nature of search