Microsoft Paper: Improving Search Results By Mining Web Surfing Activity

A new research paper from Microsoft looks at how surfing behavior — as logged by a search toolbar — can be used to improve search results.

The study used data from the Windows Live Toolbar:

For each user, interaction logs were grouped based on browser identifier information. Within each browser instance, user navigation was summarized as a path known as a browser trail, from the first to the last Web page visited in that browser. Located within some of these trails are search trails that originated with a query submission to a commercial search engine; it is these search trails that we use to train the algorithms described in the following sections.

After originating with a query submission to a search engine, search trails proceed until a point of termination where it is assumed that the user has completed their information-seeking activity or has addressed a particular aspect of their information need. Trails must contain pages that are either search result pages, or pages connected to a search result page via a sequence of clicked hyperlinks.

There’s a lot of math in the paper that goes way beyond me. But the concluding point is that by taking all the surfing data, you can generate “query-specific” authority for documents on the web — you can understand that a particular page for a particular search term is well regarded by many surfers. You can also better determine pages that are spam:

For example, it has been shown that user-validated authority may be useful for identification of web spam. Because users are unlikely to visit non-informative resources often, and will leave them almost immediately, using activity logs may provide valuable evidence to web spam detection algorithms, leaving an interesting avenue for future work.

The paper also has a nice section that summarizes off-the-page ranking factors that search engines can and have considered:

Reciprocal hyperlinks between Web pages allow authors to link their pages, sites, and repositories to other relevant sources. Link-analysis algorithms leverage this democratic feature of Web page authorship for the implicit endorsement of Web pages. Link-analysis algorithms are generally either: query independent, e.g., PageRank, where relative importance of Web pages and Web domains is computed offline prior to query submission, or query-dependent, e.g., HITS, whereby scores are assigned to documents at retrieval time given their algorithmic matching to the user’s query.

The key feature of link-analysis algorithms is that they compute the authority value based on the links created by page authors and assume that users traverse this graph in a random or pseudo-intelligent way. However, given the rapid growth in Web usage, it is now possible to leverage the collective browsing behavior of many users as an improvement over random or directed traversals of the Web graph. In this paper we describe the use of collective post-search browsing behavior of many users for this purpose.

Implicit relevance feedback methods use observable aspects of users’ search interactions (e.g., query logs, search result clicks, page display times, page scrolling activity) to support more effective search. Given that the users’ expression of their true interests and intentions is very noisy, some studies have addressed the reliability of implicit feedback. Kelly and Belkin report that reading time is not indicative of document relevance, and that it varies significantly between subjects and tasks, making it difficult to interpret.

In contrast, Fox et al. show in their study that the overall time a user interacts with a search engine, as well as the number of clicks, are indicative of user satisfaction with the search engine. Joachims et al. found that result clickthrough is influenced by the relevance of the results, and that users are biased by the trust they have in the retrieval function, and by the overall quality of the result set. They propose strategies for generating relative feedback signals from clicks.

While the Microsoft paper is looking in theory at using toolbar data, to date, I don’t believe this is being put into practice to refine results at Live Search. In contrast, Google has been using toolbar-driving surfing history to refine personalized results for over a year. See Google Search History Expands, Becomes Web History for more background on that.

Related Topics: Channel: SEO | Google: Toolbar | Google: Web History & Search History | Microsoft: Bing | Stats: Relevancy | Stats: Search Behavior


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide