Microsoft Paper: Improving Search Results By Mining Web Surfing Activity


A new research paper from Microsoft looks at how surfing behavior — as logged by a search toolbar — can be used to improve search results.

The study used data from the Windows Live Toolbar:

For each user, interaction logs were grouped based on browser identifier information. Within each browser instance, user navigation was summarized as a path known as a browser trail, from the first to the last Web page visited in that browser. Located within some of these trails are search trails that originated with a query submission to a commercial search engine; it is these search trails that we use to train the algorithms described in the following sections.

After originating with a query submission to a search engine, search trails proceed until a point of termination where it is assumed that the user has completed their information-seeking activity or has addressed a particular aspect of their information need. Trails must contain pages that are either search result pages, or pages connected to a search result page via a sequence of clicked hyperlinks.

There’s a lot of math in the paper that goes way beyond me. But the concluding point is that by taking all the surfing data, you can generate “query-specific” authority for documents on the web — you can understand that a particular page for a particular search term is well regarded by many surfers. You can also better determine pages that are spam:

For example, it has been shown that user-validated authority may be useful for identification of web spam. Because users are unlikely to visit non-informative resources often, and will leave them almost immediately, using activity logs may provide valuable evidence to web spam detection algorithms, leaving an interesting avenue for future work.

The paper also has a nice section that summarizes off-the-page ranking factors that search engines can and have considered:

Reciprocal hyperlinks between Web pages allow authors to link their pages, sites, and repositories to other relevant sources. Link-analysis algorithms leverage this democratic feature of Web page authorship for the implicit endorsement of Web pages. Link-analysis algorithms are generally either: query independent, e.g., PageRank, where relative importance of Web pages and Web domains is computed offline prior to query submission, or query-dependent, e.g., HITS, whereby scores are assigned to documents at retrieval time given their algorithmic matching to the user’s query.

The key feature of link-analysis algorithms is that they compute the authority value based on the links created by page authors and assume that users traverse this graph in a random or pseudo-intelligent way. However, given the rapid growth in Web usage, it is now possible to leverage the collective browsing behavior of many users as an improvement over random or directed traversals of the Web graph. In this paper we describe the use of collective post-search browsing behavior of many users for this purpose.

Implicit relevance feedback methods use observable aspects of users’ search interactions (e.g., query logs, search result clicks, page display times, page scrolling activity) to support more effective search. Given that the users’ expression of their true interests and intentions is very noisy, some studies have addressed the reliability of implicit feedback. Kelly and Belkin report that reading time is not indicative of document relevance, and that it varies significantly between subjects and tasks, making it difficult to interpret.

In contrast, Fox et al. show in their study that the overall time a user interacts with a search engine, as well as the number of clicks, are indicative of user satisfaction with the search engine. Joachims et al. found that result clickthrough is influenced by the relevance of the results, and that users are biased by the trust they have in the retrieval function, and by the overall quality of the result set. They propose strategies for generating relative feedback signals from clicks.

While the Microsoft paper is looking in theory at using toolbar data, to date, I don’t believe this is being put into practice to refine results at Live Search. In contrast, Google has been using toolbar-driving surfing history to refine personalized results for over a year. See Google Search History Expands, Becomes Web History for more background on that.



Danny Sullivan is editor-in-chief of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also oversees Search Engine Land’s SMX: Search Marketing Expo conference series, maintains a personal blog called Daggle and can be followed on Twitter here.

See more articles by Danny Sullivan >


Share, Bookmark & Discuss This Article
More:


Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter


See more stories like this in the Members Library! Check out the Google: Toolbar, Google: Web History & Search History, Microsoft: Live Search, Stats: Relevancy, Stats: Search Behavior sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!

Comments are closed.


RECENT COMMNENTS

  • Eric Ward said " OK, so I was trying to be ironic/funny with my maniacal idea for aggregating all my personal comment"
  • JohnWEllis said " Greg, Thanks for sharing this data. No good comes from asking people what ads are “helpful”. People "
  • Shari Thurow said " Hi Nick- Yeah, I hate the schmoozers, too. I chose to ignore them because they don't usually last lo"

See All »


FREE DAILY SEARCH NEWS RECAP!

Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›

STAY CURRENT THROUGHOUT THE DAY

RSS Feeds

The Search Engine Land feed keeps you informed as news happens. SEE ALL FEEDS »

Upcoming Search Engine Land Conferences

Advertise With Us »

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.


SMX Web Site » | SMX Difference » | SMX News »


Join us at an upcoming SMX event:

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:


See more webcast topics »

TRACK US SOCIALLY
Upcoming Search Engine Land Conferences

Get Your Search Engine Land
Premium Membership!

Become a premium member today and receive:

  • Express commenting privileges & photo.
  • Exclusive videos & newsletters.
  • Discounts to our SMX conferences.
  • Access to "How To" & Other Archives.

Learn More

Upcoming Search Engine Land Conferences
Add to GoogleAdd to My Yahoo!Add to BloglinesAdd to NetvibesAdd to Windows Live