<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>searchengineland.com &#187; Google: Patents</title>
	<atom:link href="http://searchengineland.com/library/google/google-patents/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: Must Read News About Search Marketing &#38; Search Engines</description>
	<lastBuildDate>Mon, 23 Nov 2009 12:00:24 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Google Patent On Anchor Tags And Web Crawling</title>
		<link>http://searchengineland.com/google-patent-on-anchor-tags-and-web-crawling-12895</link>
		<comments>http://searchengineland.com/google-patent-on-anchor-tags-and-web-crawling-12895#comments</comments>
		<pubDate>Tue, 11 Dec 2007 19:30:13 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: Patents]]></category>
		<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Web Search]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/google-patent-on-anchor-tags-and-web-crawling-12895.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-patent-on-anchor-tags-and-web-crawling-12895"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-patent-on-anchor-tags-and-web-crawling-12895" height="61" width="51" /></a></div><p>One of the key elements of how the Google search engine works involves the use of the words, or anchor text, that appear in a link on a source page, to describe a page targeted by the link.</p>
<p>We know this from statements about anchor text made in documents like the Lawrence Page and Sergey Brin-scribed <a href="http://infolab.stanford.edu/~backrub/google.html">The Anatomy of a Large-Scale Hypertextual Web Search Engine</a>, and the early PageRank patents authored by Lawrence Page &#8211;  <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PTXT&#038;S1=6,285,999.PN.&#038;OS=pn/6,285,999&#038;RS=PN/6,285,999">Method for node ranking in a linked database</a> and <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PTXT&#038;S1=6,799,176.PN.&#038;OS=pn/6,799,176&#038;RS=PN/6,799,176">Method for scoring documents in a linked database</a>.</p>
<p>A newly granted patent from Google, <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PTXT&#038;S1=7,308,643.PN.&#038;OS=pn/7,308,643&#038;RS=PN/7,308,643">Anchor tag indexing in a web crawler system</a>, may provide a more detailed look at the mechanics of using anchor text as a relevancy signal for a page being linked to by the search engine. It also describes some other processes about using links to rank pages and about crawling websites. I&#8217;ve written a detailed breakdown of the patent at SEO by the Sea in <a href="http://www.seobythesea.com/?p=929">Google Patent on Anchor Text and Different Crawling Rates</a>.</p>
<p>Danny asked me if I might hit on some of the highlights of the document here.</p>
<p><span id="more-12895"></span>
<b>Link Discovery and Crawling Layers</b></p>
<p>Links are at the heart of the patented process, and the discovery of links is done in at least three different ways &#8211; direct submissions of URLs, crawling of URLs, and submissions of content containing links through syndication methods like RSS.</p>
<p>The crawling of URLs may be done in three separate layers, based upon factors that could involve how frequently the content at those URLs may be updated, and what PageRank or page ranking they may have:</p>
<ul>
<li>A base layer, in which most known URLs are sectioned into segments, and those segments are crawled during a specific period such as a day, in a round robin manner until all are visited by robots programs</li>
<li>A daily layer, in which a smaller group of URLs that have a higher crawl score, crawl frequency, or both, may be visited over the same period of time that segments are crawled in the base layer.</li>
<li>A real time layer, in which an even smaller group of URLs which have even higher crawl scores, crawl frequencies or both, may be visited in much shorter intervals such as minutes or hours.</li>
</ul>
<p>The patent provides some simple formulas which define crawl scores and crawl frequencies, and also a directed approach that may favor URLs in specific categories, such as news sites and pages in specific languages or in certain file formats.</p>
<p><b>Link Logs, Anchor Maps, Duplicates, and Annotations</b></p>
<p>When a crawling program visits a URL, it may collect lists of links and content from pages in a link log which can be sent back to other programs that look at page content, at duplicate content on pages, at duplicate file structures at hosts, and at text both from anchors of links and from a distance surrounding the links.</p>
<p>URLs that contain duplicate content may be reviewed, and one URL may be chosen as a canonical, or best, version with the possibility that the other duplicate or duplicates are then ignored.</p>
<p>Identifying duplicate file/linking structures at different hosts may also result in one version being identified as a version to continue being indexed, and the other or others as versions to be ignored in the future.</p>
<p>The patent tells us that it is possible that anchor text in links pointing to duplicate URLs may be considered as anchor text pointing to the canonical version of those URLs.</p>
<p>Information about changes to pages is determined at this stage, and link maps and anchor maps are made from the link logs.</p>
<p>The change information may impact the frequency with which specific URLs are crawled, and together with something like PageRank, may determine which of the three layers a URL may be placed within.</p>
<p>The link maps may be used to determine a page ranking, such as PageRank, for documents at the different URLs.</p>
<p>The anchor maps may be used to associate anchor text and additional &#8220;annotation&#8221; information with the URLs that they point at, and that text and annotation information may be used in conjunction with other information to determine relevancy of a page to different words and phrases.</p>
<p>Here&#8217;s an example from the patent that I paraphrased in my post:</p>
<blockquote><p>For example, a link pointing to a picture of Mount Everest might read “to see a picture of Mount Everest click here.” The anchor text might be the “click here” but the additional text “to see a picture of Mount Everest” could be included in the link record.</p></blockquote>
<p><b>Robots and Temporary and Permanent Redirects</b></p>
<p>A robot crawling through links found at URLs might come across redirected links, and the patent tells us that temporary (302) and permanent (301) redirects are treated differently.</p>
<p>Temporary redirects are identified and recorded, but will be followed by a robot.</p>
<p>Permanent redirects are also identified and recorded, but instead of being followed by a robot, information about them is sent back to a scheduling program that may crawl the URL being redirected to at another time.</p>
<p><b>Conclusion</b></p>
<p>It&#8217;s important to note that this is a patent written to protect Google&#8217;s intellectual property in the processes described, but may not describe the processes that Google has actually implemented, or may only describe some of the processes being used.  The patent is also 4 years old at this point, and there&#8217;s a possibility that Google may be doing some things very differently now.</p>
<p>But the processes that are described do seem to correspond well with many observations about things such as the behavior of Google&#8217;s crawling processes and the use of anchor text as a relevancy signal, helping to determine the relevance of pages being pointed towards by those links for certain queries.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-patent-on-anchor-tags-and-web-crawling-12895/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weird Google Patents &amp; Patent Applications</title>
		<link>http://searchengineland.com/weird-google-patents-patent-applications-12080</link>
		<comments>http://searchengineland.com/weird-google-patents-patent-applications-12080#comments</comments>
		<pubDate>Tue, 04 Sep 2007 12:55:16 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: Patents]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/weird-google-patents-patent-applications-12080.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fweird-google-patents-patent-applications-12080"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fweird-google-patents-patent-applications-12080" height="61" width="51" /></a></div><p><a href="http://www.seobythesea.com/?p=804">Google’s 10 Oddest Patents</a>  by Bill Slawski shares with us some of Google&#8217;s weirdest patent applications and patents.  Here is a short rundown. Bill goes in more detail in his write up.</p>
<p><span id="more-12080"></span>
(1) A medical instrument patent
(2) A method of depositing metal alloy barrier layers
(3) Super charging CDMA technology
(4) Baseband direct sequence spread spectrum transceiver, also related to CDMA technology
(5) A method of communicating quality of service of network in real time</p>
<p>There are five more interesting and weird patents and patent applications at Bill&#8217;s blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/weird-google-patents-patent-applications-12080/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Game Ads Patent Sets Off Privacy Debate</title>
		<link>http://searchengineland.com/google-game-ads-patent-sets-off-privacy-debate-11206</link>
		<comments>http://searchengineland.com/google-game-ads-patent-sets-off-privacy-debate-11206#comments</comments>
		<pubDate>Mon, 14 May 2007 13:48:07 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: Other Ads]]></category>
		<category><![CDATA[Google: Patents]]></category>
		<category><![CDATA[Legal: Privacy]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/google-game-ads-patent-sets-off-privacy-debate-11206.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-game-ads-patent-sets-off-privacy-debate-11206"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-game-ads-patent-sets-off-privacy-debate-11206" height="61" width="51" /></a></div><p><a href="http://technology.guardian.co.uk/news/story/0,,2078061,00.html">Google may use games to analyse net users</a> from The Guardian reports that Google is considering the idea of using gaming behavior to display targeted ads to that user.   Privacy advocates are already voicing their distaste for the idea of gleaning information on users based on their gaming behaviors.  To me, this is just like the Gmail ad debate, which has died down since they first launched.</p>
<p><span id="more-11206"></span>
Google, which is known to be interested in <a href="http://searchengineland.com/070122-090005.php">in game ads</a>, has filed a <a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070072676%22.PGNR.&#038;OS=DN/20070072676&#038;RS=DN/20070072676">patent application</a> that Bill Slawski <a href="http://www.seobythesea.com/?p=556">explains</a> as:</p>
<blockquote><p>Google looks at ways of determining user information for use in targeting ads, and determining and serving relevant ads in video games. They take into account a person’s interests and gaming behavior by monitoring and making inferences from their online gaming activities.</p></blockquote>
<p>The Open Rights Group said, &#8220;I can understand why they are interested in this, but I would be deeply disturbed by a company holding a psychological profile.&#8221;  The Guardian explains that Google can learn a lot about a user within one of these online role playing games such as Second Life.  Google would love to show ads that are relevant to the user.</p>
<p>The Guardian spoke with Google, which said &#8220;it did not have any plans to roll out the technology in the near future.&#8221; When and if Google launches these ads, I am sure there will be a spark of controversy at the onset which will die down over time.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-game-ads-patent-sets-off-privacy-debate-11206/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Revisits Historical Data Ranking Factors</title>
		<link>http://searchengineland.com/google-revisits-historical-data-ranking-factors-11072</link>
		<comments>http://searchengineland.com/google-revisits-historical-data-ranking-factors-11072#comments</comments>
		<pubDate>Thu, 26 Apr 2007 15:15:17 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: Patents]]></category>
		<category><![CDATA[Google: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/google-revisits-historical-data-ranking-factors-11072.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-revisits-historical-data-ranking-factors-11072"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-revisits-historical-data-ranking-factors-11072" height="61" width="51" /></a></div><p>One of the biggest stirs of 2005 in the search marketing field was caused by the release of a patent application from Google titled <a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;p=1&#038;u=/netahtml/PTO/search-bool.html&#038;r=1&#038;f=G&#038;l=50&#038;co1=AND&#038;d=PG01&#038;s1=20050071741&#038;OS=20050071741&#038;RS=20050071741">Information retrieval based on historical data</a>.</p>
<p>It introduced time as a dimension of ranking pages, with changes in content and linking and advertising and topics as factors to be considered, as well as rates of change.  It discussed signals that might send warnings to search engines that some sites might be engaged in spamming the search engine.  It covered seasonality and burstiness of topics, and domain name ownership and a myriad of other subjects, introducing rates and frequencies of changes to web pages, and investigating how freshness and staleness might play a role in determining relevancy.</p>
<p><span id="more-11072"></span>
Two years later, the <em>Historic Data</em> patent application seems to have re-emerged, and cloned itself under a number of names, with expanded claim sections that detail different aspects of the processes described in the original.  Our friend Miguel Cuesta from google.dirson.com wrote a post today on <a href="http://google.dirson.com/post/3348-algoritmo-peso-enlaces-antiguedad/">two of the Google Applications</a> which were published this morning. I covered <a href="http://www.seobythesea.com/?p=586">two others</a> from last week at SEO by the Sea.</p>
<p>There doesn&#8217;t really seem to be much that is new in these documents when held up to the original from 2005.  But, if you hadn&#8217;t paid much attention to the different parts of that document, it might be worth revisiting.</p>
<ul>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&#038;r=1&#038;f=G&#038;l=50&#038;d=PG01&#038;p=1&#038;S1=20070088692.PGNR.&#038;OS=dn/20070088692&#038;RS=DN/20070088692">Document Scoring Based on Query Analysis</a></li>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&#038;r=1&#038;f=G&#038;l=50&#038;d=PG01&#038;p=1&#038;S1=20070088693.PGNR.&#038;OS=dn/20070088693&#038;RS=DN/20070088693">Document Scoring Based on Traffic Associated with a Document</a></li>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PG01&#038;S1=20070094255.PGNR.&#038;OS=dn/20070094255&#038;RS=DN/20070094255">Document Scoring Based on Link-Based Criteria</a></li>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PG01&#038;S1=20070094254.PGNR.&#038;OS=dn/20070094254&#038;RS=DN/20070094254">Document Scoring Based on Document Inception Date</a></li>
</ul>
<p>A look through the transaction database at the USTPO provides a little information, and shows that the original document was given a non-final rejection by the USPTO, which means that there may have been some issues in the original that needed to be addressed.  It&#8217;s difficult to pinpoint what those issues were based upon the information that they provide.</p>
<p>Another patent application published by Google that is considered to be a &#8220;child&#8221; application of the original Historical Data document is also worth a look if you missed it the first time around &#8211;  <a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220050144193%22.PGNR.&#038;OS=DN/20050144193&#038;RS=DN/20050144193">Systems and methods for determining document freshness</a></p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-revisits-historical-data-ranking-factors-11072/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Google Mobile Phone Search Patent Applications</title>
		<link>http://searchengineland.com/new-google-mobile-phone-search-patent-applications-10786</link>
		<comments>http://searchengineland.com/new-google-mobile-phone-search-patent-applications-10786#comments</comments>
		<pubDate>Thu, 22 Mar 2007 05:27:42 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: Mobile]]></category>
		<category><![CDATA[Google: Patents]]></category>
		<category><![CDATA[Legal: Patents]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/new-google-mobile-phone-search-patent-applications-10786.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fnew-google-mobile-phone-search-patent-applications-10786"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fnew-google-mobile-phone-search-patent-applications-10786" height="61" width="51" /></a></div><p>Is there a Google Phone waiting to be released, or just <a href="http://searchengineland.com/070321-111826.php">mobile software</a> that makes it easier for people to use Google to search with?  How serious is Google about mobile search?  How would such a system work?</p>
<p>I ran into a patent application on the World Intellectual Property Organization (WIPO) pages from Google that describes a phone system that makes searching on a mobile phone much faster than it is now, but which would require that data be carried over more than one session connecting to the search engine during a single search.  I haven&#8217;t seen this published at the US Patent and Trademark web site yet, so the link below is the the WIPO version.</p>
<p>Added: another patent application from Google published this morning, focuses upon a nonbrowser software application that people can use on their phones to search with Google and read emails&#8230;.</p>
<p><span id="more-10786"></span>
<a href="http://www.wipo.int/pctdb/en/fetch.jsp?LANG=ENG&#038;DBSELECT=PCT&#038;SERVER_TYPE=19&#038;SORT=1197533-KEY&#038;TYPE_FIELD=256&#038;IDB=0&#038;IDOC=1229392&#038;C=1&#038;ELEMENT_SET=IA,WO,TTL-EN&#038;RESULT=11&#038;TOTAL=218&#038;START=1&#038;DISP=25&#038;FORM=SEP-0/HITNUM,B-ENG,DP,MC,PA,ABSUM-ENG&#038;SEARCH_IA=US2006028142&#038;QUERY=PA%2fgoogle+">Overloaded Communication Session</a> <a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PG01&#038;S1=20070067329.PGNR.&#038;OS=dn/20070067329&#038;RS=DN/20070067329">USTPO Version</a>
Publication Number:    WO/2007/013958    International Application No.:    PCT/US2006/028142
Publication Date: 01.02.2007 International Filing Date: 21.07.2006</p>
<p>Int. Class.: G06F 7/00 (2006.01)
Applicants: Google
Invented by Maryam Kamvar, Shumeet Baluja, and Elad Gil</p>
<p>Abstract:</p>
<blockquote><p>A method of providing information responsive to a request from a wireless communication device involves receiving an information request from a mobile device and generating responsive information for the information request, transmitting a first portion of the responsive information to the mobile device in a first communication session, and transmitting a second portion of the responsive information to the mobile device in a second, overloaded communication session</p></blockquote>
<p>This search system may use more than one or two connections to a search engine to speed up the reception of information from a search, parsing out results to searches in multiple sessions (For instance, instead of showing ten results to a search, it may show the first five from an initial connection to the search enigne, and then while a searcher is looking at those, return the next five results.)</p>
<p>It could use regular cellular networks of voice over IP (VOIP), and be used upon PDAs and laptops, as well as phones.</p>
<p>Search results might be displayed as text upon a screen, or as audio, and could also include video.</p>
<p>Images of a possible User Interface for a Google Mobile Search system:</p>
<p><a href="http://www.flickr.com/photos/bragadocchio/430073358/" title="Photo Sharing"><img src="http://farm1.static.flickr.com/162/430073358_d16e74f9c5_m.jpg" width="240" height="147" alt="Google Phone Search User Interface" /></a></p>
<p>This patent application doesn&#8217;t tell us whether or not Google will build and release a phone, or just software, and its publication doesn&#8217;t mean that there is or isn&#8217;t some more news from Google on mobile search upcoming soon.</p>
<p><strong>Added (March 22, @ 4:00pm EST):</strong></p>
<p><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070066364%22.PGNR.&#038;OS=DN/20070066364&#038;RS=DN/20070066364">Customized data retrieval applications for mobile devices providing interpretation of markup language data</a>
Invented by Elad Gil, Shumeet Baluja, Maryam Kamvar, and Cedric Beust
US Patent Application 20070066364
Published March 22, 2007
Filed: September 19, 2005</p>
<p>If Google were to release a phone software application that could be used on many different types of phones, it might be very much like the software described within this patent application.</p>
<p>From the patent images and patent description, it appears that one could use it to search the Web, Maps, Froogle, and other Google databases.  Local searches can show maps, phone numbers to call, possibly offer text messaging and emails to a listed business, and directions.</p>
<p>Web pages followed in search results would be displayed in a format that may be appropriate for display upon the phone instead of using the formatting indicated in the pages&#8217; HTML (though the application does understand HTML, and would translate a page for display.)  This application would not be a browser, and according to the patent filiing, would not have an address bar that people could use to type in web pages and surf the Web.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/new-google-mobile-phone-search-patent-applications-10786/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Customized Search Engines to Harness The Wisdom of Experts?</title>
		<link>http://searchengineland.com/google-customized-search-engines-to-harness-the-wisdom-of-experts-10542</link>
		<comments>http://searchengineland.com/google-customized-search-engines-to-harness-the-wisdom-of-experts-10542#comments</comments>
		<pubDate>Fri, 16 Feb 2007 20:32:36 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: Custom Search Engine]]></category>
		<category><![CDATA[Google: Patents]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/google-customized-search-engines-to-harness-the-wisdom-of-experts-10542.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-customized-search-engines-to-harness-the-wisdom-of-experts-10542"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-customized-search-engines-to-harness-the-wisdom-of-experts-10542" height="61" width="51" /></a></div><p>Back in October, 2006, Google <a href="http://googleblog.blogspot.com/2006/10/eureka-your-own-search-engine-has.html">announced</a> on the Official Google Blog that they were enabling people to create their own <a href="http://google.com/coop/cse/">custom search engines</a>.</p>
<p>If you asked yourself why they were doing this, and how it might provide benefits to individual site owners, searchers as a whole, and Google itself, there are some answers that came out yesterday at the US Patent Office&#8230;</p>
<p><span id="more-10542"></span>
Google has published a series of five new patent applications on &#8220;programmable search engines,&#8221; with Ramanthan V. Guha listed as the inventor on the patents (his name was also on the announcement linked to above on the Google Blog).  From reading through the patent filings, I&#8217;m thinking that it&#8217;s safe to assume that the &#8220;programmable search engines&#8221; described are Google&#8217;s custom search engines, though the applications may describe aspects that may differ somewhat or  may not have been fully developed yet.</p>
<p>Ramanathan Guha is listed as the sole inventor on these documents, and he has an interesting history.  He joined Google in May of 2005, and had been a principle scientist for Apple Computer and for Netscape, a Co-founder and the CTO of Epinions, one of the developers of the <a href="http://cgi.netscape.com/columns/techvision/innovators_rg.html">RDF Site Summary</a> (RSS) 0.9 standards, and has a <a href="http://www.guha.com/cv.html">rich resume</a> of other accomplishments.</p>
<p>These are the patent filings covering the programmable search engines published this week:</p>
<ul>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070038616%22.PGNR.&#038;OS=DN/20070038616&#038;RS=DN/20070038616">Programmable search engine</a></li>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070038601%22.PGNR.&#038;OS=DN/20070038601&#038;RS=DN/20070038601">Aggregating context data for programmable search engines</a></li>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070038603%22.PGNR.&#038;OS=DN/20070038603&#038;RS=DN/20070038603">Sharing context data across programmable search engines</a></li>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070038614%22.PGNR.&#038;OS=DN/20070038614&#038;RS=DN/20070038614">Generating and presenting advertisements based on context data for programmable search engines</a></li>
<li><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070038600%22.PGNR.&#038;OS=DN/20070038600&#038;RS=DN/20070038600">Detecting spam related and biased contexts for programmable search engines</a></li>
</ul>
<p>The easiest way to learn about the features of Google&#8217;s custom search engines is to create one or two, so I&#8217;m not going to go into depth describing what the patent filings say about those.  The sections involving the background of the invention are pretty interesting, though.  I&#8217;m going to summarize parts of those to see if they can provide us with some insight into why these were developed and offered by Google.</p>
<p><strong>Search as an unchangeable black box</strong></p>
<p>We&#8217;re told that work on information retrieval systems mainly is focused upon improving search result quality, and is typically is measured in terms of how precise those results are, and how many results are recalled.  While there may be other quantifiable ways to measure performance, those are two of the main goals.</p>
<p>Techniques used by Web search engines involve designs which encompass basic indexing algorithms and representation of documents, query analysis and modification, relevance ranking and results presentation, and many other methods.  However they function, the processes search engines use are controlled internally, and can&#8217;t be changed by outside entities.</p>
<p>In other words, search engines operate as black boxes, receiving and processing queries using complex and preprogrammed algorithms and models which rank relevance to provide and order search results. Even if parts of the process are known, the search engine will only operate according to those algorithms and models.</p>
<p><strong>Difficulties with User Intent</strong></p>
<p>The relevance of search results depend upon a user&#8217;s search intent: why they are searching and why do they need the information? Two different people using the same query may be looking for completely different answers.</p>
<p>Attempts to solve this problem are often based upon relatively weak indicators, such as static user preferences, or predefined ways of refining queries,  often amounting to educated guesses of user interest based on the query terms.  These approaches can fail because of the highly variable nature of intent and situational facts that query terms may not clearly indicate.</p>
<p><strong>Context and Informational Needs</strong></p>
<p>The patent filing presents an example of a search using the query &#8220;Canon Digital Rebel.&#8221;</p>
<p>Does a searcher looking for that term want to buy the camera, or do they own it and want technical support, are they comparing it with other cameras, or may they be interested in learning how to use it?</p>
<p>Those situational facts, and a searcher&#8217;s information need cannot be reliably determined by either analysis of query terms, or by looking at previously stored preference data about the user.</p>
<p><strong>The Failure of Inferring Intent by Tracking</strong></p>
<p>Intent might also be inferred by tracking and analyzing prior user queries so that a model of a user&#8217;s interests might be created.  Search queries from individual users might be collected, so that interests may be determined based on a frequency of key words appearing in search queries, as well looking at which search results the user accesses. See, for instance, <a href="http://www2006.org/programme/files/pdf/3055.pdf">Retroactive Answering of Search Queries</a> (pdf).</p>
<p>The assumption that queries can accurately reflect a user&#8217;s short term or long term interests may be a problem.</p>
<p>Another potential problem is the assumption that there may be a direct and identifiable relationship between a given information need, such as shopping for a digital camera, and the query terms being used to meet that need.  We&#8217;re told that assumption is incorrect because the same query terms can be used by the same (or different users) with quite different information needs.</p>
<p><strong>Turning to Specialized Web Sites</strong></p>
<p>Because people can&#8217;t consistently rely on search engines to locate information to satisfy their informational needs, they often visit sites offering highly specialized information about particular topics, built by individuals, groups, or organizations with an expertise in those subjects.</p>
<p>These sites, vertical content sites, often include specifically created content providing in-depth information on a topic, as well as organized collections of links to related sources of information.</p>
<p>So, a site about digital cameras may include:</p>
<ul>
<li>Product reviews,</li>
<li>Guidance on how to purchase a digital camera,</li>
<li>Links to camera manufacturer&#8217;s sites,</li>
<li>Price comparison engines,</li>
<li>Other sources of expert opinion, and;</li>
<li>Other helpful information.</li>
</ul>
<p>People running these sites, subject domain experts, often have considerable knowledge about the value of other sites on the Web.  Using their expertise, these content developers can also best structure their site&#8217;s content to address the variety of different information needs of users.</p>
<p><strong>A Need to Share Search with Subject Matter Experts</strong></p>
<p>Someone visits one of these vertical content sites, where they find a good amount of useful information related to their needs. They may then return to a general search engine to find more relevant information.  But when they do, the expertise they found at the vertical content site is no longer available to them from the search engine.</p>
<p>It&#8217;s not unusual for vertical content sites to provide search fields letting people access a general search engine. But those just pass search queries back to the general search engine.</p>
<p>Can the expertise of the owner of the vertical content site become available to a search engine during a searcher&#8217;s query, to provide more meaningful search results? If the search engine was a custom one, with some aspects of it programmed by the vertical site owner, it might allow their expertise to be shared with the searcher, with other similar sites using custom search engines, and with the search engine.</p>
<p>Aggregated context information might also be collected from a number of these programmable search engines, and become available to searchers even when they are entering a search at the general search engine instead of at a vertical search site.</p>
<p><strong>Other Aspects of Using Programmable Search Engines</strong></p>
<p>In short, custom search engines at vertical sites allow people to search using content sources decided upon and possibly annotated by the site owners.  Information collected from the source choices and the labeling and annotation of those sources, and from the use of those custom searches may help inform results at other custom search engines involving related searches, and in query suggestions offered by Google on search results pages from regular Web searches.</p>
<p>A couple of other important topics are each discussed in individual patent applications &#8211; advertising and spam or bias.</p>
<p>Of course, Google would want to show advertisements with search results.  Can the context (or user intent) taken from such searchers be used to inform the content of advertisements shown to searchers, or associated with the content shown on one of these vertical search pages?</p>
<p>There is a potential that people will try to abuse a system like this.  The patent application focusing primarily upon &#8220;spam related and biased content,&#8221; describes filtering processes that may be used to avoid abuse.</p>
<p><strong>Conclusion</strong></p>
<p>If you haven&#8217;t tried out Google&#8217;s custom search engines, they are very easy to set up, and to use.  If you own a site that focuses upon a particular subject, and consider yourself an expert on that subject, your expertise in setting up a custom search engine may influence results on other custom search engines from Google, and in suggestions on Google&#8217;s results pages in response to certain queries.</p>
<p>The only issue that I have with these patent applications is that they appear to assume that people setting up custom search engines on specific topics are  experts on those subjects.  Yet, if you visit a site on a topic, and find value and expertise upon the site, you may find value and expertise in a custom search set up on that site, too.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-customized-search-engines-to-harness-the-wisdom-of-experts-10542/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Agent Rank Patent Application</title>
		<link>http://searchengineland.com/googles-agent-rank-patent-application-10487</link>
		<comments>http://searchengineland.com/googles-agent-rank-patent-application-10487#comments</comments>
		<pubDate>Fri, 09 Feb 2007 20:45:12 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: Patents]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/googles-agent-rank-patent-application-10487.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogles-agent-rank-patent-application-10487"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogles-agent-rank-patent-application-10487" height="61" width="51" /></a></div><p>Google returns results based upon content appearing upon individual pages, or at specific URLs.  But that content could come from different authors, who have different levels of control over it.  For example, a blog page may have posts written by more than one author, comments penned by others, and advertisements showing ads that even the owner of the site has no direct control over.  A forum might have many different authors responding to an initial post, and may also display advertisements.</p>
<p>Imagine a system that instead of ranking content on a page level, breaks those pages down and looks at smaller content items on those pages, which it associates with digital signatures.  Content creators could be given reputation scores, which could influence the rankings of pages where their content appears, or which they own, edit, or endorse.</p>
<p>That&#8217;s a broad overview of a new patent application from Google&#8230;</p>
<p><span id="more-10487"></span>
<a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070033168%22.PGNR.&#038;OS=DN/20070033168&#038;RS=DN/20070033168">Agent rank</a></p>
<p>Invented by David Minogue and Paul A. Tucker
US Patent Application 20070033168
Published February 8, 2007
Filed:  August 8, 2005</p>
<p>Abstract</p>
<blockquote><p>The present invention provides methods and apparatus, including computer program products, implementing techniques for searching and ranking linked information sources. The techniques include receiving multiple content items from a corpus of content items; receiving digital signatures each made by one of multiple agents, each digital signature associating one of the agents with one or more of the content items; and assigning a score to a first agent of the multiple agents, wherein the score is based upon the content items associated with the first agent by the digital signatures.</p></blockquote>
<p><strong>Agents and Authority</strong></p>
<p>When we perform a search at Google, we receive responses to queries based upon how relevant those results might be to our search terms.  The order of those results is based upon  rankings influenced by both query-dependent and query-independent criteria.</p>
<p>Query-dependent criteria are signals that try to identify how semantically related a document is to a query, such as word frequency distributions.</p>
<p>Query-independent criteria are signals that attempt to identify how authoritative, or intelligible, or trustworthy a document might be, such as PageRank.  PageRank tries not only to look at the number of references to a document, but also the quality of those references.</p>
<p>Can authority or trustworthiness be measured in a different way, based upon understanding who the author of content on pages might be, through the use of digital signatures associated with an author? Could query-independent signals be tied to that author, so that a score for content created or controlled or edited or reviewed by the author could be used to rank pages?</p>
<p>This patent application describes a system where that might be a possiblity.</p>
<p><strong>Agent Control of a Resource</strong></p>
<p>The document begins by looking at how much control that agents might have over specific resources.</p>
<p>When all content from a resource is under the control of a single agent, the reputation of the agent can be directly related to the content of that resource. But, it&#8217;s possible that a page has more hands involved than one, that each control different parts of a page.  In that case, if the different partitions of information can be indentified, reputation for each agent might be calculated at that partition level.</p>
<p>Difficulties involved with this approach might involve the fact that an agent may contribute content to many different resources, a single source may be created or controlled by multiple agents, and the ownership and control of a resource may change over time.</p>
<p><strong>Benefits of the Approach</strong></p>
<p>The patent filing describes a number of features and approaches, and they are worth looking over, but I want to focus upon the benefits that they say this will bring to us:</p>
<ol>
<li> Identifying individual agents responsible for content can be used to influence search ratings.</li>
<li>The identity of agents can be reliably associated with content.</li>
<li>The granularity of association can be smaller than an entire web page, so agents can disassociate themselves from information appearing near the information for which the agent is responsible.</li>
<li>An agent can disclaim association with portions of content, such as advertising, that appear on the agent&#8217;s web site.</li>
<li>The same agent identity can be attached to content at multiple locations.</li>
<li>Multiple agents can make contributions to a single web page where each agent is only associated to the content that they provided.</li>
</ol>
<p><strong>Digital Signatures for Content</strong></p>
<p>Different content pieces on a page can be signed with a digital signature, either directly by the agent or indirectly on behalf of the agent. These signatures identify who actually created each content piece on a page. One example for a method of creating and validating digital signatures is the World Wide Web Consortium&#8217;s <a href="http://www.w3.org/TR/xmldsig-core/">XML-Signature Syntax and Processing </a></p>
<p>Content pieces can have multiple signatures based upon roles an agent may take involving the content, such as author, publisher, editor, or reviewer.</p>
<p>An agent would have exclusive access to the private key they use to sign the content piece, and the digital signature could also include metadata such as creation date, review score, or recommended keywords for search.</p>
<p>Agents could sign only a portion of a page, and exclude content over which they don&#8217;t claim any responsibility, such as ads served alongside the document.</p>
<p>That content can range from individual hyperlinks to entire documents, and can include text, images, audio, or video. The signature can also allow people to verify that the signed content hasn&#8217;t been materially altered since the signature was generated.</p>
<p>If you want to allow your content and signature to be portable, such as for a syndicated article, you could state that in the meta data associated with the content.</p>
<p><strong>Ranking and Reputation Scores</strong></p>
<p>Tying a page to an author can influence the ranking of that page. If the author has a high reputation, content created by him or her many be considered to be more authoritative that similar content on other pages. If the agent reviewed or edited content instead of authoring it, the score for the content might be ranked differently.</p>
<p>An agent may have a high reputation score for certain kinds of content, and not for others &#8211; so someone working on site involving celebrity news might have a strong reputation score for that kind of content, but not such a high score for content involving professional medical advice.</p>
<p>Reputation systems are often measured in terms of effectiveness by how difficult they might be to attack and manipulate.  Here, there are at least two factors that may help keep manipulation from happening:</p>
<ol>
<li>Reputational scores may be set so that they are relatively difficult to increase and relatively easy to decrease, so that an agent may not want to place his or her reputation at risk by endorsing content inappropriately.</li>
<li>Since signatures of reputable agents can promote ranking of signed content in search results, agents are provided a powerful incentive to establish and maintain good reputational scores.</li>
</ol>
<p>The method of ranking based upon reputation scores is described in an analogy based upon PageRank.  There&#8217;s also some discussion of an alternative possibility of using a seed group of trusted agents to endorse other content. Agents whose content receives consistently strong endorsements might gain reputation under that method. In either implementation, the agent&#8217;s reputation ultimately depends on the quality of the content which they sign.</p>
<p>The use of digital signatures enables the reputation system to link reputations with individual agents, and adjust the relative rankings based on all of the content each agent chooses to associate himself or herself with, no matter where the content may be located. That could even include content that isn&#8217;t on the internet.</p>
<p><strong>Conclusion</strong></p>
<p>This is a very different way of providing rankings for pages, based upon the reputations of agents who may have interacted with, and digitally signed content on those pages.</p>
<p>Ted Nelson, one of the early pioneers of hypertext, spoke at Google a couple of weeks ago (<a href="http://video.google.com/videoplay?docid=-8329031368429444452&#038;q=type%3Agoogle+engEDU">Transclusion: Fixing Electronic Literature</a> &#8211; link to video).  He described a very different kind of hypertext than what we are familiar with, which involved a system for connecting electronic documents with content from multiple sources appearing on the same pages together. The last question in the Q&#038;A part of the presentation asked how his electronic documents might be connected so that they can be found easily. His answer, &#8220;I guess Google will do that.&#8221; This isn&#8217;t the system that Ted Nelson envisioned, but it shares some similarities.</p>
<p>I could see blogging systems building tools that allow for digital signatures like the ones described here, such as the <a href="http://www.sixapart.com/typekey/">Typekey</a> feature in Typepad to authenticate the identity of commenters on multiple blogs.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/googles-agent-rank-patent-application-10487/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s OneBox Patent Application</title>
		<link>http://searchengineland.com/googles-onebox-patent-application-10325</link>
		<comments>http://searchengineland.com/googles-onebox-patent-application-10325#comments</comments>
		<pubDate>Mon, 22 Jan 2007 18:46:30 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: Patents]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/googles-onebox-patent-application-10325.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogles-onebox-patent-application-10325"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogles-onebox-patent-application-10325" height="61" width="51" /></a></div><p>One of my favorite search related articles is one that Danny wrote a few years back titled <a href="http://searchenginewatch.com/showPage.html?page=3115131">Searching With Invisible Tabs</a>. It stands out because it describes one of the major difficulties involving how search engines work &#8211; making a user interface as simple as possible, while still somehow providing information that can meet a wide range of the intentions behind a search.</p>
<p>Danny also introduced us to the <a href="http://searchenginewatch.com/showPage.html?page=3332511">one box results</a> name used internally by Google for what he described as &#8220;invisible tab promotion of some of its specialty content.&#8221;  Are these inserted vertical search results the way to serve invisible tab results to searchers?</p>
<p>OneBox results have been the topic of sessions during Search Engine Strategies conferences under the name <a href="http://blog.searchenginestrategies.com/050726-170239.html">Vertical Creep Into Regular Search Results</a>, which provided a chance for conference attendees to talk about these more narrowly defined types of searches appearing above organic results in Web searches at Google.  During one of these sessions which I attended, a question during the Q &#038; A part of that session was &#8220;how does Google determine whether or not to show OneBox results?&#8221;  That may have been the only question unanswered.</p>
<p>Earlier this month, Google published a patent application that may provide a little insight into how and why different OneBox results are shown.</p>
<p><span id="more-10325"></span>
<strong>What Google has Told Us About OneBox Results</strong></p>
<p>Before describing the patent application, I want to briefly explore some of what we&#8217;ve learned about these additional results directly from Google.</p>
<p>The Google Help Center <a href="http://www.google.com/help/interpret.html">Search Results Page</a>, describes OneBox results:</p>
<blockquote><p>Google&#8217;s search technology finds many sources of specialized information. Those that are most relevant to your search are included at the top of your search results. Typical onebox results include news, stock quotes, weather and local websites related to your search.</p></blockquote>
<p>A tour of OneBox features for both Web Search and Enterprise search appears on the <a href="http://www.google.com/enterprise/gsa/onebox.html">Google OneBox for Enterprise</a> page (see the link labeled &#8220;Tour of OneBox features&#8221;).</p>
<p>Brian Smith recently interviewed Google Product Marketing Director Debbie Jaffe about these listings in <a href="http://searchenginewatch.com/showPage.html?page=3623898">A Closer Look at Google OneBox Results</a>.</p>
<p><strong>The OneBox Patent Application</strong></p>
<p>Many patent filings include a &#8220;Description of Related Art&#8221; section where they often define a reason for the creation of their invention.  This one tells us that:</p>
<blockquote><p>Some search engine systems can provide various types of information as the search results. For example, a search engine system might be capable of providing search results relating to web pages, news articles, images, merchant products, usenet pages, yellow page entries, scanned books, and/or other types of information. Typically, a search engine system provides separate interfaces to these different types of information.</p>
<p>When a user provides a search query to a standard search engine system, the user is typically provided with links to web pages. If the user desires another type of information (e.g., images or news articles), the user typically needs to access a separate interface provided by the search engine system.</p></blockquote>
<p>While Google shows tabs that searchers can select to view results for other kinds of information repositories, it&#8217;s not unusual for people to ignore those, or as Danny writes in his article on invisible tabs, to suffer from &#8220;tab blindness.&#8221;  The OneBox is a solution to that problem.  But how does Google know when to show which types of results?</p>
<p><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070005568%22.PGNR.&#038;OS=DN/20070005568&#038;RS=DN/20070005568">Determination of a desired repository</a>
Invented by Michael Angelo, David Braginsky, Jeremy Ginsberg, and Simon Tong
US Patent Application 20070005568
Published January 4, 2007
Filed: 	June 29, 2005</p>
<p>Abstract</p>
<blockquote><p>A system receives a search query from a user and searches a group of repositories, based on the search query, to identify, for each of the repositories, a set of search results. The system also identifies one of the repositories based on a likelihood that the user desires information from the identified repository and presents the set of search results associated with the identified repository.</p></blockquote>
<p><strong>A Mix of Possible OneBox Determination Methods</strong></p>
<p>The patent lists at least seven different variations that it might follow to possibly determine whether OneBox results appear for a search, and which type of results appear within the OneBox, but they are mostly subtle variations of each other.  All of them involve looking closely at the query used, a likelihood that the searcher is looking for information from a number of different data repositories, somehow scoring results from those repositories, and serving results from one or more of them.</p>
<p>One variation describes a process in which log data is collected about searchers and searches of repositories. The log data is represented as triples of data (u, q, r), with u being information about the searchers, q as information about the query, and r is information about repositories from which search results were provided. Labels for each of the triples of data (u, q, r) are created, where the label includes information about whether the user u desired information from the repository r when the user provided the search query q. Instructions are created to train a model based on the triples of data (u, q, r) and their associated labels, to predict whether a particular user desires information from certain repositories when providing a particular search query.</p>
<p>This log data, with triples of information, are referred to as &#8220;instances&#8221; and the system that uses then may include millions of instances.</p>
<p>Hundreds of thousands of distinct features may be included for any given (u, q, r), for example:</p>
<ul>
<li>The country in which user u is located,</li>
<li>The language of the country in which user u is located,</li>
<li>A cookie identifier associated with user u,</li>
<li>The language of query q,</li>
<li>Each term in query q,</li>
<li>The time of day user u provided query q, the documents from repository r that were presented to user u,</li>
<li>Each of the terms in the documents from repository r that were presented to user u, and/or; each of the terms in the titles of the documents from repository r that were presented to the user u.</li>
<li>The fraction of queries that were provided to the interface for repository r,</li>
<li>The fraction of queries that were provided to the interface for repository r versus the interfaces for other repositories,</li>
<li>The fraction of queries that contain a term in query q that were provided to the interface for repository r versus the interfaces for other repositories,</li>
<li>The overall click rate for queries provided to the interface for repository r,</li>
<li>The click rate for queries provided to the interface for repository r for user u,</li>
<li>The click rate for queries provided to the interface of repository r for users in the same country as user u,</li>
<li>The click rate for query q provided to the interface of repository r.</li>
<li>The click rate of query q provided to the interface of repository r for user u, and,</li>
<li>The fraction of queries q that were provided to the interface of repository r for user u.</li>
</ul>
<p>This data might be used to create a model may be created based on the data, which could possibly be used to predict, given a new (u, q, r), whether a searcher wants information from a specific repository if they provided a certain query.  That model might be used to then make a decision as to whether or not to search a specific repository and present results from it on a search results page.</p>
<p>The patent filings lists a number of different types of repositories of documents, such as:</p>
<ul>
<li>A web page repository,</li>
<li>A news repository,</li>
<li>An image repository,</li>
<li>A products repository,</li>
<li>A usenet repository,</li>
<li>A yellow pages repository</li>
<li>A scanned books repository, and/or;</li>
<li>Other types of repositories.</li>
</ul>
<p><strong>A High Level Overview</strong></p>
<p>1. A query is received from a searcher.</p>
<p>2. Information about the searcher may be collected, such as an IP address, cookie information, language preferences, and/or geographical information.</p>
<p>3. A search might be performed on each of the repositories based on the query, and sets of search results could be obtained for each.</p>
<p>4. Decisions would then be made as to which results would be presented to that searcher.  This would be based upon information about the searcher, the search query used, and input from each of the repositories.  There are at least three alternative approaches to returning results from more than one repository:</p>
<p>a) The results from the two highest scoring repositories would be presented.</p>
<p>b) Results from one repository may always be presented, and one or more of the highest scoring of the others would be shown.</p>
<p>c) Only results with scores above a certain threshold would be shown, and if there are none above that threshold, then the highest scoring result would be returned.</p>
<p>The scores, and whether or not they are above a certain threshold may determine the order or manner in which they are presented to a searcher.  So, results from one repository which is shown, but is not above a threshold score may appear at the bottom of results, or may display only a link to more results of that type instead of appearing as results on the initial results page.</p>
<p>The model may also contain an &#8220;exploration&#8221; policy that lets it gather information on different repositories. So, it might provide search results from a lower scoring repository (e.g., presenting news documents rather than images) to a small fraction of users at random, or show documents from a repository in proportion to the score (e.g., if the score for images is twice the score for news articles, then images may be presented twice as often as news articles).</p>
<p><strong>Conclusion</strong></p>
<p>If I read this patent filing correctly, user data about queries in the different vertical searches may influence which documents or objects appear in OneBox results.  So, if a lot of people go to Google Image Search and look for pictures of &#8220;lions&#8221;, then OneBox results may show images of lions.  If suddenly, a lot of people are looking for &#8220;lions&#8221; on Google news searches, then we might also see news results the OneBox area, instead of the images or in addition to them.</p>
<p>If that&#8217;s correct, then a OneBox approach to invisible tabs means that we will still see tabs for some types of searches because individual searches in the different repositories influence which results are returned in the OneBox.</p>
<p>As a patent application, the methods described may or may not reflect accurately how OneBox results are chosen, but the document provides some insights from Google on considerations that may be taken into account in the decisions to provide those results.  It is interesting to see how large a role user behavior could have in those decisions.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/googles-onebox-patent-application-10325/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Google Billboard &amp; Google Kiosk Coming?</title>
		<link>http://searchengineland.com/google-billboard-google-kiosk-coming-10289</link>
		<comments>http://searchengineland.com/google-billboard-google-kiosk-coming-10289#comments</comments>
		<pubDate>Wed, 17 Jan 2007 19:55:35 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: Other Ads]]></category>
		<category><![CDATA[Google: Patents]]></category>
		<category><![CDATA[Legal: Patents]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/google-billboard-google-kiosk-coming-10289.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-billboard-google-kiosk-coming-10289"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fgoogle-billboard-google-kiosk-coming-10289" height="61" width="51" /></a></div><p>Clickz columnist Ryan Naraine wrote up some of his thoughts about a Google patent application (<a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PG01&#038;S1=20060287913.PGNR.&#038;OS=dn/20060287913&#038;RS=DN/20060287913">Allocating advertising space in a network of displays</a>)  that would enable advertising upon electronic displays and billboards in shopping centers and other places, in his article <a href="http://www.clickz.com/showPage.html?page=3624571">Google Patent Filing Hints at Digital Billboard Ad Network</a>.</p>
<p>While <a href="http://www.newscientisttech.com/article/mg19325854.900-street-advertising-gets-localstocksavvy.html">New Scientist</a> wrote about the patent filing last week, and I had a <a href="http://www.seobythesea.com/?p=405">writeup</a> on it the day it was published, Ryan does a nice job of providing some context to how this electronic display network might work&#8230;</p>
<p><span id="more-10289"></span>
Ryan tells us that:</p>
<blockquote><p>If the filing is a sign of things to come from Google, kiosk-type billboards, ATM machines and other digital displays in malls and hotel lobbies could start hawking products directly from a nearby retailer’s inventory.</p></blockquote>
<p>I think he may be right, and another patent filing from Google shows some signs of that.  How far away might we be from Google Kiosk?  The following images are from that earlier Google application published this last summer.</p>
<p><a href="http://www.flickr.com/photos/bragadocchio/360857308/" title="Photo Sharing"><img src="http://farm1.static.flickr.com/165/360857308_985cdb27a9.jpg" width="500" height="433" alt="Google Kiosk Showing Products" style="border:0px" /></a></p>
<p>Above is one of a number of screenshots shown from a kiosk software that might displayed in shopping malls, resort areas, and other commercial districts, according to <a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PG01&#038;S1=20060143080.PGNR.&#038;OS=dn/20060143080&#038;RS=DN/20060143080">Generating and/or serving dynamic promotional offers such as coupons and advertisements</a>.  Another screen might show walking directions within a mall, such as the picture below.</p>
<p><a href="http://www.flickr.com/photos/bragadocchio/360857312/" title="Photo Sharing"><img src="http://farm1.static.flickr.com/147/360857312_e49a001ec7.jpg" width="500" height="433" alt="Google Kiosk Showing Mall Layout" style="border:0px" /></a></p>
<p>Imagine getting directions, learning about items for sale, printing out coupons, and seeing how long the wait is at mall restaurants from a kiosk like this?  I could see replacing mall &#8220;you are here&#8221; signs with kiosks providing this kind of information.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-billboard-google-kiosk-coming-10289/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search Patent Documents for 1-12-07 &#8211; Limited Access Documents in Search Results</title>
		<link>http://searchengineland.com/search-patent-documents-for-1-12-07-limited-access-documents-in-search-results-10252</link>
		<comments>http://searchengineland.com/search-patent-documents-for-1-12-07-limited-access-documents-in-search-results-10252#comments</comments>
		<pubDate>Fri, 12 Jan 2007 17:34:10 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: Patents]]></category>
		<category><![CDATA[Legal: Patents]]></category>
		<category><![CDATA[Search Features: Tagging]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/search-patent-documents-for-1-12-07-limited-access-documents-in-search-results-10252.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fsearch-patent-documents-for-1-12-07-limited-access-documents-in-search-results-10252"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fsearch-patent-documents-for-1-12-07-limited-access-documents-in-search-results-10252" height="61" width="51" /></a></div><p>A wide range of newly published patent applications and granted patents, covering such ground as subscribed content in search results from Google; tag searching, detecting similar audio files, and simpler support vector machines from Yahoo; geographic based searching from MetaCarta; and data center architecture and smarter results to queries from Microsoft, amongst others&#8230;</p>
<p><span id="more-10252"></span>
<strong>Patent Applications</strong></p>
<p><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070005595%22.PGNR.&#038;OS=DN/20070005595&#038;RS=DN/20070005595">Document access control</a>
<em>Google (20070005595) </em></p>
<p>Focuses primarily on setting the ability to access documents for individuals.  Where it starts to become interesting is when it discusses checking access rights before performing a search on an intranet or the internet.  Might this lead to subscribed content showing up in search results along side web results?  Hard to tell. <em>added: the document is talking about subscription based sites or collections of images or calendar applications that aren&#8217;t normally accessible in search results, instead of the type of pages that show an abstract, and then charge you to read the rest of the document.  It would be nice to be able to filter those out of search results.</em></p>
<p><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070011161%22.PGNR.&#038;OS=DN/20070011161&#038;RS=DN/20070011161">User interface for navigating a keyword space</a>
<em>Yahoo (20070011161)</em></p>
<p><a href="http://www.flickr.com/photos/bragadocchio/355033776/" title="Photo Sharing"><img src="http://farm1.static.flickr.com/126/355033776_6073de1711_m.jpg" width="240" height="164" alt="A closer Look at Yahoo's Tagging Search User Interface" style="float: right; margin:0 0 10px 10px; border:1px solid #666" "  /></a></p>
<p>Related to the patent filing I discussed in <a href="http://searchengineland.com/070110-124043.php">The Social Side of Trustrank</a>, this patent application discusses searching through tags that you, or your friends may have used on web pages.</p>
<p><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070011110%22.PGNR.&#038;OS=DN/20070011110&#038;RS=DN/20070011110">Building support vector machines with reduced classifier complexity</a>
<em>Yahoo (200700411110)</em></p>
<p>Describes a way that the use of support vector machines (SVM) can be simplified, so that they may be used in sorting through search results to rerank them based upon some relevancy factors.</p>
<p><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070011154%22.PGNR.&#038;OS=DN/20070011154&#038;RS=DN/20070011154">System and method for searching for a query</a>
<em>TextDigger, Inc. (20070011154) </em></p>
<p>It&#8217;s possible that some synonyms for queries used in a search may be considered to be equivalent for that search (such as car and automobile), and could be used to provide relevant results for the query.  This patent application explores the idea.</p>
<p><a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PG01&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&#038;r=1&#038;f=G&#038;l=50&#038;s1=%2220070011150%22.PGNR.&#038;OS=DN/20070011150&#038;RS=DN/20070011150">User Interface For Geographic Search</a>
<em>MetaCarta, Inc. (20070011150)</em></p>
<p>Interesting to see an advanced geographic-based search engine from a company that isn&#8217;t one of the major search engines.</p>
<p><strong>Granted Patents</strong></p>
<p><a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PALL&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&#038;r=1&#038;f=G&#038;l=50&#038;s1=7,162,473.PN.&#038;OS=PN/7,162,473&#038;RS=PN/7,162,473">Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users</a>
<em>Microsoft (7,162,473)</em></p>
<p>I wrote a little about how some of the lessons learned from working on Microsoft&#8217;s desktop search and Vista were influencing search at live.com in <a href="http://searchengineland.com/070104-152118.php">Improved Information Retrieval &#8211; Looking at Context with Susan Dumais</a>.  This patent takes more ideas from that research, and discusses how it could be used to do things such as provide information from standing queries to users.</p>
<p><a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PALL&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&#038;r=1&#038;f=G&#038;l=50&#038;s1=7,162,509.PN.&#038;OS=PN/7,162,509&#038;RS=PN/7,162,509">Architecture for distributed computing system and automated design, deployment, and management of distributed applications</a>
<em>Microsoft (7,162,509)</em></p>
<p>There&#8217;s been a lot of recent stories in the news about new locations for data centers for search engines.  If you&#8217;re interested in learning a little about how a data center works, the detailed description in this granted patent provides a fair amount of explanation and detail.</p>
<p><a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PALL&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&#038;r=1&#038;f=G&#038;l=50&#038;s1=7,162,691.PN.&#038;OS=PN/7,162,691&#038;RS=PN/7,162,691">Methods and apparatus for indexing and searching of multi-media web pages</a>
<em>Oracle (7,162,691) </em></p>
<p>Audio, Video, and multimedia content provide search engines with some indexing issues.  This patent describes a way to automatically create text annotations, or meta data, to be used in indexing those pages and improve searches for the content they contain.</p>
<p><a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PALL&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&#038;r=1&#038;f=G&#038;l=50&#038;s1=7,162,482.PN.&#038;OS=PN/7,162,482&#038;RS=PN/7,162,482">Information retrieval engine</a>
<em>Yahoo, through MusicMatch (7,162,691)</em></p>
<p>If you search through YouTube, you might find the same video appearing multiple times, under different names and with different tags and information attached to it.  The same is true with sites that search for audio.  If there were a way to recognize that those were the same video or audio files, it might have implications for the way those are indexed.  This patent focuses upon indexing audio files by comparing information derived from looking at the contents of the files themselves.  The patent notes that the process is one that could be used for more than just audio.</p>
<p>Disclaimer: Patents are filed to protect ideas and methods developed as part of the intellectual property of a company, and may be used to exclude others from using the same, or similar processes, but the granting of a patent or publication of a patent application doesn&#8217;t necessarily mean that the processes involved have been fully developed, or will be in the future. Yet, the documents can provide some insight into the ideas that an organization is working upon, and may act as a starting point for more research.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/search-patent-documents-for-1-12-07-limited-access-documents-in-search-results-10252/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
