An Insider’s View Of Google Universal Search
Google’s Universal Search was definitely a buzz generator at Search Engine Strategies in San Jose. Universal was tagged as probably the most significant development in search this year, and its effects are just starting to be felt throughout the industry, by both searchers and search marketers alike. It’s particularly timely that this week’s guest columnist […]
Google’s Universal Search was definitely a buzz generator at Search Engine Strategies in San Jose. Universal was tagged as probably the most significant development in search this year, and its effects are just starting to be felt throughout the industry, by both searchers and search marketers alike. It’s particularly timely that this week’s guest columnist is Google’s David Bailey, who heads the team working on Universal Search. Today, David shares the reasoning behind the team’s work on Universal Search, along with some examples of how it works.—Gord Hotchkiss
In May 2007 Google launched several new search features collectively dubbed “Universal Search.” A central component of that launch was to more frequently and more deeply intersperse web results with results from news, local business listings, videos, and scanned books. In fact those launches were just one milestone (albeit a very important one) in an ongoing process of improving the search experience by searching across all content types and ranking them in an integrated way. In this article I offer a perspective on that broader mission, share some background on the supporting technology, and discuss the user interface challenges that will partly determine where it all leads.
Rather than a radical departure, Universal Search is in fact a natural step toward our company’s mission: “Organize the world’s information and make it universally accessible and useful”.
Universal Search supports this mission in a couple ways. Let’s start with “accessible.” Over the years, the major search companies have introduced ever more “vertical” search engines for specialized content, such as news, scanned books, local business listings, or patents. Each vertical occasionally has its day in the sun when it has the perfect answer for your query. But the Achilles’ heel is that many potential users never discover these verticals, or can’t remember where to find them, or can’t be bothered to go checking them for relevant results all the time.
The bottom line is that people live busy lives, and they want to have one search box that they can rely on to access everything, including specialized, vertical content when appropriate. Hence, our primary goal is to make everything fully searchable from the one search box on www.google.com. Google wants to give you the best answer possible regardless of whether that answer comes from a video, an image, or a web page.
Incidentally, for the etymologically curious, this is the true motivation behind the name of Google’s long-standing “Onebox” system, in which top results from our news, images, or other verticals are shown compactly at the top of the page for queries like michael kenna photos.
Most people assume the name refers to the display “box” where the results appear, when actually it refers to the idea of a single search box to find anything.
Next let’s talk about “organize.” We have broadened our result types to incorporate different types of content. The next challenge is how to display summaries of the content such that people can find what they are looking for in the most efficient way possible. Most verticals provide rich structured data about each document. Business listings have mappable street addresses, videos have ratings and thumbnails, books and news articles have publication dates, news articles can be clustered by the underlying stories, and so forth.
Another goal of Universal Search is to bring forward those extra elements onto the main search results page to help users understand and choose amongst the results. In some cases, like publication dates, this is as straightforward as adding a line of text to the snippet. In other cases, like maps, it requires you to reorganize the page so that all the mappable results are contiguous and near the map. Much of the user interface challenge is in accommodating such organizational changes without complicating the interface or compromising ranking.
Our conception of Universal Search extends beyond just incorporating content that our verticals receive via structured feeds or uploads. Often such structured data applies to regular web results too. We aim to provide unbiased searching across the entire web, so in such cases it’s our goal to extract that data and thus be able to seamlessly apply Universal Search’s ranking and UI elements to a much broader portion of the web—for example, see the Metacafe-hosted video about midway down the results page for origami crane.
Of course extraction is a heuristic process which makes it a bit harder, but expect to see developments on that front, such as our experimental service for viewing web results on a timeline or map.
Under the covers
If users are going to rely on the main search box for their vertical-search needs, it had better do a decent job of finding those results. With our infrastructure, we’ve taken a big leap forward in this department, and in turn this has facilitated some interesting user interface changes. So it’s worth taking a technical detour to understand the essential change.
The yellow diagram illustrates the key point. Our previous system is shown on the left, and Universal Search is on the right. With the previous system, when a query like david beckham came in from a user, the system first classified it to determine which verticals (besides web search) were highly likely to be useful and then only searched those indexes.
In the illustration, web and images were deemed likely so only the corresponding indexes (marked “W” and “I”) were searched, and finally we rendered a page with images plus web results. News, video and books (“N”, “V”, and “B”) were untouched. The system strived to capture only the most clear-cut cases (a.k.a., “high precision, low recall”) because historically it was tied to top-of-page placement where relevance is paramount. Consequently the query usually just searched the web index and maybe one or two others.
The problem with this design is that it missed numerous circumstances where a vertical was only moderately useful and/or we couldn’t tell just from looking at the query. A favorite example of mine is the obscure query bigtable:
Bigtable is a Google database technology that computer scientists have been curious about since we published it. A query classifier is unlikely to be able to label this query as “video oriented.” It doesn’t explicitly ask for videos and it’s just too far down the long tail to learn much from user behavior. You’ve simply got to go and check whether there are any good videos. That’s what we now do, and what turns up in our video index is a highly rated lecture by its designer, Jeff Dean, delivered at the University of Washington—a superb if surprising addition to academic articles already on the page.
So to capture this win, we need the system shown on the right of the yellow diagram. The query comes in and is sent, speculatively, to most or all of the search indexes. Only after they’ve each had their chance to find relevant results—and we thus have many more clues about which verticals are relevant—do we regroup and solve the ranking problem of decide which results to highlight and how to arrange them on the page. In this example, both news and video turn up worthwhile additions to the page which the old system would have missed, while the books results are discarded. This universal ranking system borrows heavily from our core expertise in web ranking, but also has new elements to leverage special signals pertinent to some of the verticals and to manage the page layout when results ought to be grouped.
Currently, this architectural transition is far enough along to be a proven success but it is by no means complete, and more importantly we’ve only begun to tap the potential improvements in relevance and page layout that it enables. The upshot for users is that you should expect a lot more changes and more aggressive presentation of more verticals in the months ahead.
Keeping things simple
And finally, that brings us to the challenge of scaling up the user interface. Currently we place UI-rich results like local and images in clear groups in a few fixed positions on the page. For example, when local business listings are useful but there are even better web results, we offer the business listings with map in the center of the page, such as for sushi san francisco.
This gives users some consistency they can get accustomed to, while permitting more flexible ranking options than we had before. Meanwhile other verticals like news, videos and books are made to look quite compatible with web results (except for their thumbnail images) and so are currently blended in somewhat more freely.
There’s no doubt we’ve started conservatively. That’s because, while we are searching much more broadly than before, so far we have only modestly increased the overall blending rate of the verticals onto the rendered page and thus it’s not very common to see multiple verticals blending at once. The low overlap has permitted this simple design. What’s appealing about this is that we preserve one of the key elements of Google’s simplicity—the single, relevance-ordered list format. The simple scanning strategy of starting at the top and look straight down until you find a result you like, remains intact.
But there are challenges that remain. We’ll continue to perform user studies and live experiments to learn ever more about eye scanning patterns so we can optimize our designs. We’ll pay close attention to issues like the effects of thumbnail images, and the role of grouped results on the page. We’ll optimize the tradeoffs between grouped displays that help organize the page and encourage browsing and refinement, versus blended results that facilitate keeping results in relevance order. I expect that both Google and the other major search engines will continue to experiment aggressively in this area for quite a while.
Relevance is still key
So that’s where we stand today. We have more engineers than ever working on search and it continues to be at the heart of what we do. There are many open questions in Universal Search, and it will be quite interesting to watch Google and the other major search engines explore the possibilities. But along the way, there are a few things you can count on from Google: our search engine will remain fast, the user interface will remain simple, and above all we will continue to deliver the most comprehensive and relevant results to the best of our ability.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.