Google’s OneBox Patent Application

One of my favorite search related articles is one that Danny wrote a few years back titled Searching With Invisible Tabs. It stands out because it describes one of the major difficulties involving how search engines work – making a user interface as simple as possible, while still somehow providing information that can meet a wide range of the intentions behind a search.

Danny also introduced us to the one box results name used internally by Google for what he described as “invisible tab promotion of some of its specialty content.” Are these inserted vertical search results the way to serve invisible tab results to searchers?

OneBox results have been the topic of sessions during Search Engine Strategies conferences under the name Vertical Creep Into Regular Search Results, which provided a chance for conference attendees to talk about these more narrowly defined types of searches appearing above organic results in Web searches at Google. During one of these sessions which I attended, a question during the Q & A part of that session was “how does Google determine whether or not to show OneBox results?” That may have been the only question unanswered.

Earlier this month, Google published a patent application that may provide a little insight into how and why different OneBox results are shown.

What Google has Told Us About OneBox Results

Before describing the patent application, I want to briefly explore some of what we’ve learned about these additional results directly from Google.

The Google Help Center Search Results Page, describes OneBox results:

Google’s search technology finds many sources of specialized information. Those that are most relevant to your search are included at the top of your search results. Typical onebox results include news, stock quotes, weather and local websites related to your search.

A tour of OneBox features for both Web Search and Enterprise search appears on the Google OneBox for Enterprise page (see the link labeled “Tour of OneBox features”).

Brian Smith recently interviewed Google Product Marketing Director Debbie Jaffe about these listings in A Closer Look at Google OneBox Results.

The OneBox Patent Application

Many patent filings include a “Description of Related Art” section where they often define a reason for the creation of their invention. This one tells us that:

Some search engine systems can provide various types of information as the search results. For example, a search engine system might be capable of providing search results relating to web pages, news articles, images, merchant products, usenet pages, yellow page entries, scanned books, and/or other types of information. Typically, a search engine system provides separate interfaces to these different types of information.

When a user provides a search query to a standard search engine system, the user is typically provided with links to web pages. If the user desires another type of information (e.g., images or news articles), the user typically needs to access a separate interface provided by the search engine system.

While Google shows tabs that searchers can select to view results for other kinds of information repositories, it’s not unusual for people to ignore those, or as Danny writes in his article on invisible tabs, to suffer from “tab blindness.” The OneBox is a solution to that problem. But how does Google know when to show which types of results?

Determination of a desired repository Invented by Michael Angelo, David Braginsky, Jeremy Ginsberg, and Simon Tong US Patent Application 20070005568 Published January 4, 2007 Filed: June 29, 2005

Abstract

A system receives a search query from a user and searches a group of repositories, based on the search query, to identify, for each of the repositories, a set of search results. The system also identifies one of the repositories based on a likelihood that the user desires information from the identified repository and presents the set of search results associated with the identified repository.

A Mix of Possible OneBox Determination Methods

The patent lists at least seven different variations that it might follow to possibly determine whether OneBox results appear for a search, and which type of results appear within the OneBox, but they are mostly subtle variations of each other. All of them involve looking closely at the query used, a likelihood that the searcher is looking for information from a number of different data repositories, somehow scoring results from those repositories, and serving results from one or more of them.

One variation describes a process in which log data is collected about searchers and searches of repositories. The log data is represented as triples of data (u, q, r), with u being information about the searchers, q as information about the query, and r is information about repositories from which search results were provided. Labels for each of the triples of data (u, q, r) are created, where the label includes information about whether the user u desired information from the repository r when the user provided the search query q. Instructions are created to train a model based on the triples of data (u, q, r) and their associated labels, to predict whether a particular user desires information from certain repositories when providing a particular search query.

This log data, with triples of information, are referred to as “instances” and the system that uses then may include millions of instances.

Hundreds of thousands of distinct features may be included for any given (u, q, r), for example:

  • The country in which user u is located,
  • The language of the country in which user u is located,
  • A cookie identifier associated with user u,
  • The language of query q,
  • Each term in query q,
  • The time of day user u provided query q, the documents from repository r that were presented to user u,
  • Each of the terms in the documents from repository r that were presented to user u, and/or; each of the terms in the titles of the documents from repository r that were presented to the user u.
  • The fraction of queries that were provided to the interface for repository r,
  • The fraction of queries that were provided to the interface for repository r versus the interfaces for other repositories,
  • The fraction of queries that contain a term in query q that were provided to the interface for repository r versus the interfaces for other repositories,
  • The overall click rate for queries provided to the interface for repository r,
  • The click rate for queries provided to the interface for repository r for user u,
  • The click rate for queries provided to the interface of repository r for users in the same country as user u,
  • The click rate for query q provided to the interface of repository r.
  • The click rate of query q provided to the interface of repository r for user u, and,
  • The fraction of queries q that were provided to the interface of repository r for user u.

This data might be used to create a model may be created based on the data, which could possibly be used to predict, given a new (u, q, r), whether a searcher wants information from a specific repository if they provided a certain query. That model might be used to then make a decision as to whether or not to search a specific repository and present results from it on a search results page.

The patent filings lists a number of different types of repositories of documents, such as:

  • A web page repository,
  • A news repository,
  • An image repository,
  • A products repository,
  • A usenet repository,
  • A yellow pages repository
  • A scanned books repository, and/or;
  • Other types of repositories.

A High Level Overview

1. A query is received from a searcher.

2. Information about the searcher may be collected, such as an IP address, cookie information, language preferences, and/or geographical information.

3. A search might be performed on each of the repositories based on the query, and sets of search results could be obtained for each.

4. Decisions would then be made as to which results would be presented to that searcher. This would be based upon information about the searcher, the search query used, and input from each of the repositories. There are at least three alternative approaches to returning results from more than one repository:

a) The results from the two highest scoring repositories would be presented.

b) Results from one repository may always be presented, and one or more of the highest scoring of the others would be shown.

c) Only results with scores above a certain threshold would be shown, and if there are none above that threshold, then the highest scoring result would be returned.

The scores, and whether or not they are above a certain threshold may determine the order or manner in which they are presented to a searcher. So, results from one repository which is shown, but is not above a threshold score may appear at the bottom of results, or may display only a link to more results of that type instead of appearing as results on the initial results page.

The model may also contain an “exploration” policy that lets it gather information on different repositories. So, it might provide search results from a lower scoring repository (e.g., presenting news documents rather than images) to a small fraction of users at random, or show documents from a repository in proportion to the score (e.g., if the score for images is twice the score for news articles, then images may be presented twice as often as news articles).

Conclusion

If I read this patent filing correctly, user data about queries in the different vertical searches may influence which documents or objects appear in OneBox results. So, if a lot of people go to Google Image Search and look for pictures of “lions”, then OneBox results may show images of lions. If suddenly, a lot of people are looking for “lions” on Google news searches, then we might also see news results the OneBox area, instead of the images or in addition to them.

If that’s correct, then a OneBox approach to invisible tabs means that we will still see tabs for some types of searches because individual searches in the different repositories influence which results are returned in the OneBox.

As a patent application, the methods described may or may not reflect accurately how OneBox results are chosen, but the document provides some insights from Google on considerations that may be taken into account in the decisions to provide those results. It is interesting to see how large a role user behavior could have in those decisions.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: Industry | Google: Patents

Sponsored


About The Author: is the Director of Search Marketing for Go Fish Digital and the editor of SEO by the Sea. He has been doing SEO and web promotion since the mid-90s, and was a legal and technical administrator in the highest level trial court in Delaware.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.solaswebdesign.net Miriam

    Good Evening Bill,

    Can I say this back to you, and you tell me if I’m at least understanding this patent in a cursory manner?

    Basically, what the patent suggests is that the information in the Google Onebox may be largely influenced by document popularity. Do I have this right?

    If I am right so far, than would it also be right to suggest that geo stats might also be part of this determination of which results appear in the Onebox? For example, if a bunch of people from Idaho searched for a photo of lions, and they consistently chose a certain result, would Google perhaps be geotargeting which picture of lions would appeal most to other Idaho searchers? Or, is that taking this too far?

    I’d appreciate your reply!
    Miriam

  • http://www.seobythesea.com Bill Slawski

    Hi Miriam,

    The patent focuses upon which type of OneBox result to show more than which results to show within the OneBox itself.

    So, if mountain lions started showing up in cities in Idaho, and a lot of people in Idaho started searching for mountain lion in Google News with the query “mountain lion,” then a search in the regular Google Web search for “mountain lion” might start showing people in that area some news results in the OneBox about mountain lions.

    The patent didn’t discuss if they would use geotargeting information on that fine a level – an individual state – but maybe it would.

    Normally, a search for “mountain lion” (without the quotes) might show pictures of mountain lions. It does for me. But, a lot of recent queries on that topic in Google News might either cause Google to show news results instead of, or in addition to the pictures. It’s possible, under what the patent application describes, to do either.

  • http://www.solaswebdesign.net Miriam

    Thank you, Bill.
    I think I understand now. Thank you again for all of your wonderful writing on things like this. It really helps!
    Miriam

  • Bill Slawski

    You’re welcome, Miriam.

    Thank you.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide