Improved Information Retrieval – Looking at Context with Susan Dumais
Desktop and file search can be very different than web search, and the user’s context plays an important role in what is valuable when creating a search algorithm. But understanding context may be helpful to web search, too.
Microsoft’s Susan Dumais has done an extensive amount of research on how users interact with search applications for the desktop and for Microsoft’s Vista. She recently visited Yahoo, as part of their Big Thinker Series. The presentation was at their Yahoo! Mission College location on Tuesday, December 12, 2006.(via Gary Price.) A Microsoft patent application from this morning expands upon the presentation.
A video of that presentation is now available through Yahoo Video, and discusses ideas about improved search based upon user context, and covers rich metadata, tagging, memory landmarks, refinding things, and keeping found things found.
As Susan Dumais notes in the presentation, information retrieval isn’t done for its own sake. It needs to be thought about in the the context of the individuals and groups that it is created for. While we can think about queries in terms of informational, transactional, and navigation uses, people also are researching, learning, and being entertained when searching.
A lot of the presentation focuses upon research that has been documented in a few papers, and articles:
- Stuff I’ve Seen: A System for Personal Information Retrieval and Re-Use (pdf)
- Susan Dumais: Changing the Way People Search for Information, Through Algorithms and User Interfaces
- Searching for Your Information? Go PHLAT Out
We’re told that the research on refinding information has influenced the Microsoft Live interface, but that it also looks at different information silos, which require some different ways of thinking about search, such as the web, email, files, applications, photos, contacts, and calendaring.
The future of search is going to involve more than just the web. It will look at searching intranets, and a searcher’s own computer as well, and because it will involve a searcher’s own content, they believe that they can provide a richer user experience which includes things like end user tagging, while still providing a single unified point of access to finding information within the context of performing other tasks.
As part of the research that Microsoft did while looking at search in different contexts, they found some interesting information about desktop search:
- Queries tend to be very short – shorter than on the web,
- Query syntax allows for a more advance search interface
- Three most popular advanced operators:
- new query
- People opened email often, in an enterprise environment
- Different search characteristics were exhibited for home workers
- About half the things opened were things that people received in the last month.
- Different kinds of content had different halflives – websites – half of them were things looked at in the last couple of weeks.
- Date is by far the most common sort order – time is really important in retrieving your own information.
- Very few “best match” searches – people already know what they are looking for.
- Metadata is very useful, but the quality is variable. Some applications enforce better metadata collection than others, such as email.
- Useful data is dependent upon applications – for instance, in calendars, the most important date isn’t when you received a notification, but rather the date of the meeting.
There’s more in the presentation about personalized search, memory landmarks and timelines, and the benefits of user tagging. It also includes a very brief comparison of the different desktop search methods from Google, Yahoo, and Microsoft.
Coinicidentally, I noticed a new patent application published this morning from Microsoft, with Susan Dumais listed as one of the inventors, that covers a fair amount of the information discussed in the presentation.
Analysis of topic dynamics of web search
Invented by Susan T. Dumais, Eric J. Horvitz, Xuehua Shen
Assigned to Microsoft
US Patent Application 20070005646
Published January 4, 2007
Filed: June 30, 2005
Here’s a snippet from the description of the document that starts to discuss some of what it includes:
 The Web provides opportunities for gathering and analyzing large data sets that reflect users’ interactions with web-based services. Analysis and synthesis of the rich data provided by these logs promises to lead to insights about user goals, the development of techniques that provide higher-quality search results based on enhanced content selection and ranking algorithms, and new forms of search personalization. The ability to model and predict users search and browsing behaviors has been explored by developers in several areas. The analysis of URL access patterns has been used to improve Web cache performance and to guide pre-fetching. In general, models developed for caching and pre-fetching average over large numbers of users, and exploit the consistency in access patterns for individual URLs or sites, but do not consider topical consistency. Another line of investigation has explored the paths that users take in browsing and searching web sites. This includes clustering techniques to group users with similar access patterns, with the goal of identifying common user needs. This technology involves detailed analysis of individual web sites. There has been some recent work exploring how page importance computations can be specialized to different users and topics.
If you want to dive into the patent filing first, I’d recommend watching the presentation before you do. Instead of trying to understand what it was attempting to get at, I found myself anticipating things that might be included within it because I viewed the presentation before tackling it.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.