<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Engine Land &#187; Search Resources</title>
	<atom:link href="http://searchengineland.com/library/search-resources/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: News On Search Engines, Search Engine Optimization (SEO) &#38; Search Engine Marketing (SEM)</description>
	<lastBuildDate>Fri, 10 Feb 2012 01:45:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>It&#8217;s Back-To-School Time; Check Out the Training Videos at IMTcourses.com</title>
		<link>http://searchengineland.com/its-back-to-school-time-check-out-the-training-videos-at-imtcourses-com-89704</link>
		<comments>http://searchengineland.com/its-back-to-school-time-check-out-the-training-videos-at-imtcourses-com-89704#comments</comments>
		<pubDate>Thu, 18 Aug 2011 13:00:14 +0000</pubDate>
		<dc:creator>Search Engine Land</dc:creator>
				<category><![CDATA[Search Resources]]></category>
		<category><![CDATA[SMX & SMN Alerts]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=89704</guid>
		<description><![CDATA[It&#8217;s Back-To-School Time; Check Out the Training Videos at IMTcourses.com Search Engine Land released its first search marketing training videos today. They are now available on Internet Marketing Training Courses (http://imtcourses.com), a new online store for internet marketing training materials launched by our parent company Third Door Media. The series, Fundamentals of Search Engine Marketing, [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s Back-To-School Time; Check Out the Training Videos at IMTcourses.com</p>
<p>Search Engine Land released its first search marketing training videos today. They are now available on Internet Marketing Training Courses (<a href="http://imtcourses.com/">http://imtcourses.com</a>), a new online store for internet marketing training materials launched by our parent company Third Door Media.</p>
<p>The series, <em><a href="http://imtcourses.com/catalog/fundamentals-of-search-engine-marketing-p-6.html">Fundamentals of Search Engine Marketing</a></em>, has four videos featuring expert instructors:</p>
<ul>
<li><em><a href="http://imtcourses.com/catalog/keyword-research-and-copywriting-p-5.html">Keyword Research &#038; Copywriting</a></em> with Christine Churchill, president of KeyRelevance</li>
<li><em><a href="http://imtcourses.com/catalog/link-building-fundamentals-p-2.html">Link Building Fundamentals</a></em> with Debra Mastaler, president of Alliance-Link</li>
<li><em><a href="http://imtcourses.com/catalog/paid-search-fundamentals-p-3.html">Paid Search Fundamentals</a></em> with Matt Van Wagner, president of Find Me Faster</li>
<li><em><a href="http://imtcourses.com/catalog/search-engine-friendly-web-design-p-4.html">Search Engine Friendly Web Design</a></em> with Shari Thurow, founder and SEO director at Omni Marketing Interactive</li>
</ul>
<p>All four presentations feature full motion video and integrated slide presentations. They can be purchased as a set or individually. Video samples and more information are available at <a href="http://imtcourses.com/">IMTcourses.com</a>.</p>
<p>IMTcourses.com is opening with this these four videos. The mission of the site is to host internet marketing instructional materials in many formats, covering a wide variety of digital marketing topics from many different providers. Those wishing to sell instructional materials through IMTcourses should contact product manager <a href="mailto:cmcdonagh@thirddoormedia.com">Chris McDonagh</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/its-back-to-school-time-check-out-the-training-videos-at-imtcourses-com-89704/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>7 Search Tools You May Not Know &#8230; But Should</title>
		<link>http://searchengineland.com/7-search-tools-you-may-not-know-but-should-15198</link>
		<comments>http://searchengineland.com/7-search-tools-you-may-not-know-but-should-15198#comments</comments>
		<pubDate>Tue, 21 Oct 2008 12:45:52 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Search Engines: Meta Search Engines]]></category>
		<category><![CDATA[Search Engines: Other Search Engines]]></category>
		<category><![CDATA[Search Engines: Photo & Image Search]]></category>
		<category><![CDATA[Search Resources]]></category>
		<category><![CDATA[SEM Tools]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=15198</guid>
		<description><![CDATA[Google UK recently shared a list of 52 Things to Do on a variety of Google properties (found via Phil Bradley). It&#8217;s a collection of tools and tips about using Google products and services for some everyday functions. If you&#8217;re a search power user, you probably know most of them already. But Google&#8217;s message seems [...]]]></description>
			<content:encoded><![CDATA[<p>Google UK recently shared a list of <a href="http://www.google.co.uk/landing/thingstodo/">52 Things to Do</a> on a variety of Google properties (found via <a href="http://philbradley.typepad.com/phil_bradleys_weblog/2008/10/google---things-to-do.html">Phil Bradley</a>). It&#8217;s a collection of tools and tips about using Google products and services for some everyday functions. If you&#8217;re a search power user, you probably know most of them already. But Google&#8217;s message seems to be, &#8220;Did you know you could do all this stuff on Google?&#8221;</p>
<p>It got us thinking about non-Google search tools that might have slipped notice altogether, or just fallen off your radar. With that in mind, here&#8217;s a list of seven search tools you may not know about &#8230; but should.</p>
<p>Read on to discover about how to see search suggestions from all major search engines on one page; a “cover flow” interface to see face images from Google Images; a new way to get recommendations about music, movies and more; new tools to search multiple search engines from one place; a tool for finding hot event tickets and as assist for hunting through Flickr’s many photos.</p>
<p><span id="more-15198"></span><strong>Soovle</strong></p>
<p><a href="http://soovle.com/">Soovle</a> offers a unique search interface that puts a variety of search sites on a single page. But what makes it unique is that, as you type in the search box, Soovle shows you the auto-completion phrases that each search site recommends. In addition to being original, that function could serve to help with a keyword research project. It looks like this:</p>
<p><a title="Soovle by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/2960221705/"><img src="http://farm4.static.flickr.com/3271/2960221705_32415de636.jpg" alt="Soovle" width="500" height="298" /></a></p>
<p>Google is the default search site when you arrive, but you can use the right-arrow on your keyboard to quickly select a different site to perform your search. And there&#8217;s also a daily update on the <a href="http://www.soovle.com/top/">top auto-complete terms</a>. Each day, Soovle queries the search sites to find out what they show as the top results for each letter of the alphabet. Pretty cool stuff.</p>
<p><strong>facesaerch</strong></p>
<p>If you like the &#8220;cover flow&#8221; feature that Apple iTunes offers, you&#8217;ll like this new image search engine. <a href="http://www.facesaerch.com/">facesaerch</a> (yes, &#8220;a&#8221; before &#8220;e&#8221;) takes a Google image search, eliminates everything but faces, and gives the results a more modern interface. It looks like this:</p>
<p><a title="facesaerch Image Search by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/2960221871/"><img src="http://farm4.static.flickr.com/3150/2960221871_a995965ff7.jpg" alt="facesaerch Image Search" width="500" height="225" /></a></p>
<p>It&#8217;s nothing groundbreaking overall, but one nice addition is a customizable widget that lets you embed a facesaerch widget on your blog or web page, complete with cool thumbnail scrolling and all. (For your Oprah Winfrey fan page, of course.)</p>
<p><strong>TasteKid</strong></p>
<p><a href="http://www.tastekid.com/">TasteKid</a> is more of a recommendation engine than a search engine. It covers movies, music, and books, offering suggestions for things you might like based on what you search for. The interface is gorgeous (albeit a bit dark/goth), and the recommendations are generally good. Search for U2, for example, and TasteKid suggests you try out INXS, R.E.M., Sting, Bruce Springsteen, Coldplay, and several other artists &#8212; most of which fit what a typical U2 fan might enjoy.</p>
<p><a title="TasteKid Entertainment Search by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/2961063362/"><img src="http://farm4.static.flickr.com/3174/2961063362_0ce9f33a87.jpg" alt="TasteKid Entertainment Search" width="500" height="371" /></a></p>
<p>There are question marks next to each recommendation. When you mouseover a question mark, TasteKid displays additional information from Wikipedia, YouTube, and Amazon about that artist (or book, movie, actor, etc.). It uses Google Gadgets to offer a widget that can be embedded into your web page or blog.</p>
<p><strong>fasteagle</strong></p>
<p><a href="http://www.fasteagle.com/">Fasteagle</a> is a combination search tool and web directory rolled into one interface, with a little touch of feed reader built in, too. The home page gives you quick access to search a dozen different sites, from Google to Delicious to eBay to FriendFeed.</p>
<p><a title="fasteagle search by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/2961063416/"><img src="http://farm4.static.flickr.com/3251/2961063416_cc29d0fb13.jpg" alt="fasteagle search" width="500" height="289" /></a></p>
<p>It would be nice to be able to customize those 12 options, or add more to the original 12 to make your own personal search portal. But I don&#8217;t see that option anywhere on fasteagle, which is still in beta. Meanwhile, clicking on the categories in the top menu (Tools, News, Business, etc.) leads to new sets of sub-categories in the left-side menu. Under the Tech category, for example, the left menu changes to show sub-categories such as Web World, Tech Vloggers, IT News, Computing, Apple, Google, Mobile Computing, and Web Marketing. That last sub-category includes sites like <a href="http://searchengineland.com/">Search Engine Land</a>, <a href="http://www.marketingpilgrim.com/">Marketing Pilgrim</a>, <a href="http://searchenginewatch.com/">Search Engine Watch</a>, and several others. Click on any link, and the site shows up in the main fasteagle window, with the top and side menus still showing &#8212; making fasteagle almost like a feed reader that gives you quick access to hundreds of web sites in rapid succession.</p>
<p><strong>FanSnap</strong></p>
<p>Have you searched for event tickets lately? It&#8217;s not fun, and it&#8217;s not easy. <a href="http://www.fansnap.com/">FanSnap</a> hopes to change that by providing a one-stop source for finding tickets to sporting events, theatre productions, and concerts.</p>
<p><a title="Fan Snap Ticket Search by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/2961063612/"><img src="http://farm4.static.flickr.com/3174/2961063612_3837a81e72.jpg" alt="Fan Snap Ticket Search" width="500" height="315" /></a></p>
<p>FanSnap doesn&#8217;t sell tickets; it lets you find tickets being sold by brokers and others in the secondary ticket market. At the moment, I don&#8217;t see inventory from official ticket sellers such as Ticketmaster or TicketsWest. They get inventory from more than 50 ticket resellers, making it a much easier way to shop than visiting the individual web sites of that many ticket brokers. To borrow a comparison <a href="http://gigaom.com/2008/10/14/meet-fansnap-search-engine-for-live-event-tickets/">Om Malik</a> recently made, it&#8217;s like Zillow for event tickets.</p>
<p><strong>compfight</strong></p>
<p>Strange name for a Flickr image search engine, but don&#8217;t let it keep you away. <a href="http://www.compfight.com/">Compfight</a> offers a handful of customizations that help you drill down into Flickr&#8217;s enormous pool of user-uploaded photos.</p>
<p><a title="CompFight Flickr Search by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/2961064048/"><img src="http://farm4.static.flickr.com/3230/2961064048_d79a47008b.jpg" alt="CompFight Flickr Search" width="500" height="248" /></a></p>
<p>You can search the full text of a photo page (title, description, and tags), or if that&#8217;s producing too many matches, you can just search tags. You can search for photos that allow Creative Commons commercial usage. You can search for photos that are original to Flickr. You can also turn Flickr&#8217;s Safe Search on or off. And you can combine all these options in any search combination you want. And rather than Flickr&#8217;s clunky, default, 10-at-a-time search results, you get dozens of thumbnails with compfight.</p>
<p><strong>Kedrix</strong></p>
<p>There are plenty of meta-search engines out there, but only one that wants you to &#8220;mearch&#8221; instead of &#8220;search.&#8221; That one is <a href="http://www.kedrix.com/">Kedrix</a>, which is trying to coin a new word based on the words &#8220;meta&#8221; and &#8220;search.&#8221; That doesn&#8217;t work for me, but the search engine does, thankfully.</p>
<p><a title="Kedrix Meta-Search by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/2961064130/"><img src="http://farm4.static.flickr.com/3181/2961064130_dae842c3a6.jpg" alt="Kedrix Meta-Search" width="500" height="444" /></a></p>
<p>The Kedrix premise is simple: It&#8217;s actually not a meta-search engine in the traditional sense. Rather than mash results from different search engines together (as Metacrawler, Dogpile, Mamma, and others do), Kedrix separates the results from the four main search engines on tabs. Google results are all under one tab, Yahoo under another, and so forth. In that sense, it&#8217;s more like a search engine comparison tool. And that makes it somewhat more valuable to SEOs (who like to compare results across different engines) than your standard meta-search engine.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/7-search-tools-you-may-not-know-but-should-15198/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Video: Easy Tips For Better Searching</title>
		<link>http://searchengineland.com/new-video-easy-tips-for-better-searching-15119</link>
		<comments>http://searchengineland.com/new-video-easy-tips-for-better-searching-15119#comments</comments>
		<pubDate>Wed, 15 Oct 2008 19:57:32 +0000</pubDate>
		<dc:creator>Danny Sullivan</dc:creator>
				<category><![CDATA[Search Features: Commands]]></category>
		<category><![CDATA[Search Features: Query Refinement]]></category>
		<category><![CDATA[Search Resources]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=15119</guid>
		<description><![CDATA[Last year, Common Craft produced a great, short video explaining RSS &#8220;in plain English.&#8221; The company is back now with another wonderful one, Web Search Strategies in Plain English. It&#8217;s less than three minutes long and does a great job explaining how to use some of the most simple &#8220;power&#8221; commands at searchers&#8217; disposal &#8212; [...]]]></description>
			<content:encoded><![CDATA[<p>Last year, Common Craft produced a great, short <a href="http://www.commoncraft.com/rss_plain_english">video explaining RSS &#8220;in  plain English.</a>&#8221; The company is back now with another wonderful one, <a href="http://www.commoncraft.com/web-search-strategies">Web Search Strategies in  Plain English</a>. It&#8217;s less than three minutes long and does a great job  explaining how to use some of the most simple &#8220;power&#8221; commands at searchers&#8217;  disposal &#8212; the minus sign and quotation marks. Be sure to check it out! The  video is also below:</p>
<p><object type="application/x-shockwave-flash" width="320" height="260" data="http://www.vimeo.com/moogaloop.swf?clip_id=1799104&amp;server=www.vimeo.com&amp;fullscreen=0&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color="><param name="quality" value="best" /><param name="allowfullscreen" value="false" /><param name="scale" value="showAll" /><param name="movie" value="http://www.vimeo.com/moogaloop.swf?clip_id=1799104&amp;server=www.vimeo.com&amp;fullscreen=0&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=" /></object></p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/new-video-easy-tips-for-better-searching-15119/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>From My Inbox: Some Search Tools To Check Out</title>
		<link>http://searchengineland.com/from-my-inbox-some-search-tools-to-check-out-13948</link>
		<comments>http://searchengineland.com/from-my-inbox-some-search-tools-to-check-out-13948#comments</comments>
		<pubDate>Wed, 07 May 2008 14:17:52 +0000</pubDate>
		<dc:creator>Danny Sullivan</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Webmaster Central]]></category>
		<category><![CDATA[Search Engines: Other Search Engines]]></category>
		<category><![CDATA[Search Engines: Social Search Engines]]></category>
		<category><![CDATA[Search Engines: Video Search Engines]]></category>
		<category><![CDATA[Search Engines: Word Of Mouth & Buzz Search Engines]]></category>
		<category><![CDATA[Search Resources]]></category>
		<category><![CDATA[SEM Tools]]></category>
		<category><![CDATA[Stats: Search Behavior]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/from-my-inbox-some-search-tools-to-check-out-13948.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>After <a href="http://searchengineland.com/lands/microsoft-yahoo-merger.php">Yahoo-Microsoft madness</a>, there&#8217;s been a bit of a lull so I&#8217;m cleaning out my inbox
and wanted to mention a few items that might be of interest. Below, a way to
quickly search blogs &amp; social media sites all at once, a new video search tool,
a study into automatic search queries, an awesome Twitter search tool, a way to
track search rankings over time, and a compilation of Google help info for site
owners, from Google, in PDF form. Plus, a way to see headlines from blogs and news sites in a variety of subjects, including search marketing.</p>
<p><span id="more-13948"></span></p>
<p><b><a href="http://addictomatic.com/">Addict-o-matic</a></b></p>
<p>From Dave Pell of <a href="http://rollyo.com/">Rollyo</a>, it lets you pull
back matching results from news search, blog search, and social media sharing
sites. </p>
<p>What&#8217;s going on with Myanmar and the cyclone recovery? A
<a href="http://addictomatic.com/topic/myanmar">myanmar</a> search lets you see
top results from major news sites, Google Blog Search, Technorati, YouTube,
Digg, Flickr, and more on a single page. Nice! Don&#8217;t want a particular source?
Just click the X in its box to make it disappear. </p>
<p>The <a href="http://addictomatic.com/newsfix/">NewsFix</a> area also lets you
see sources selected for particular topics.
<a href="http://addictomatic.com/newsfix/thought20">Thought 2.0</a> is kind of
fun &#8212; a variety of A-List blogs all on a single page, showing their headlines. </p>
<p><b><a href="http://www.snipp.tv/">Snipp.TV</a></b></p>
<p>The pitch to me was, &quot;a unique Video search engine that combines Speech
Recognition and Textual search to provide the capability of search within the
media soundtrack as well as on the textual file details &#8211; resulting in more
results, with higher relevancy.&quot; </p>
<p>Sigh.</p>
<p><a href="http://www.blinkx.com/">Blinkx</a> has had a similar pitch for some
time, nor is it the only one that has tried this. See my
<a href="http://searchengineland.com/070226-102002.php">Video Search Challenge
Isn&#8217;t Speech Recognition, It&#8217;s Content Owner Management</a> if you want the hype
dissection.</p>
<p>Sigh and hype-deflation all done, I guess it&#8217;s another video search site
worth keeping an eye on. Right now, Snipp tells me they have about 10 content
partners they work with, such as Reuters, CNET, and Fox News. Another 15 are
supposed to come. The video index is updated daily.</p>
<p><b>
<a href="http://research.microsoft.com/research/pubs/view.aspx?0rc=p&#038;type=Publication&#038;id=1833">
A Large-Scale Study of Automated Web Search Traffic</a></b></p>
<p>Last week, I did a write-up about a huge number of search research papers (<a href="http://searchengineland.com/080428-094846.php">WWW2008:
Search Research Paper Roundup</a>). Gary Price over at
<a href="http://resourceshelf.com/">ResourceShelf</a> then pinged me about this
one. From the abstract:</p>
<blockquote>
<p>As web search providers seek to improve both relevance and response times,
they are challenged by the ever-increasing tax of automated search query
traffic. Third party systems interact with search engines for a variety of
reasons, such as monitoring a website&#8217;s rank, augmenting online games, or
possibly to maliciously alter click-through rates. In this paper, we investigate
automated traffic in the query stream of a large search engine provider. We
define automated traffic as any search query not generated by a human in real
time. We first provide examples of different categories of query logs generated
by bots. We then develop many different features that distinguish between
queries generated by people searching for information, and those generated by
automated processes. We categorize these features into two classes, either an
interpretation of the physical model of human interactions, or as behavioral
patterns of automated interactions. We believe these features formulate a basis
for a production-level query stream classifier.</p>
</blockquote>
<p>I got all excited hoping I&#8217;d find the paper fascinating. I&#8217;m clearly the
wrong audience, but maybe someone else will. It primarily covers ways to help
identify potential bot-generated queries.</p>
<p><b><a href="http://www.searchrascal.com">SearchRascal</a></b></p>
<p>I know, I know. Tools that let you check rankings seem a dime a dozen, and
years ago I stopped caring about them when I &#8212; along with many others &#8211;
started preaching that it&#8217;s more about watching your analytics and seeing what
actually sends you traffic than guessing at terms that might send traffic and
obsessing over monitoring them.</p>
<p>Still, for other reasons, it&#8217;s nice to know how results have changed over
time. Search Rascal does this &#8212; tracks the top results in any query for Google,
Yahoo, and Microsoft. Check out the results for
<a href="http://www.searchrascal.com/report?q=cars">cars</a>. You can see if
something is new from last week, if it has moved up, down, or so on. You can do
daily and monthly comparisons, too. You can only go back in time if someone has
previously established tracking on a particular query, though.</p>
<p><b><a href="http://summize.com/">Summize</a></b></p>
<p>My favorite new Twitter search tool, since it always seems to be online and
stable, unlike some other tools I&#8217;ve tried. What are people Twittering about
Xena: Warrior Princess? A search for <a href="http://summize.com/search?q=xena">
xena</a> quickly brings back <a href="http://twitter.com/dannysullivan">me</a>
and <a href="http://twitter.com/oilman">Oilman</a> plotting a potential weekly
viewing night. What are people twittering about Microsoft? Try a
<a href="http://summize.com/search?q=microsoft">microsoft</a> search. Want to
monitor something? Just use the &quot;Feed for this query&quot; link at the top right of
each results page to add it to your favorite feed reader.</p>
<p><b><a href="http://www.groowe.com/">Groowe</a></b></p>
<p>If you&#8217;re not using the Groowe toolbar, go get it now. It&#8217;s one of those
tools I tried and have stuck with for years. If gives you virtually all the
features of the Google Toolbar, but with a click, you can tap into Yahoo,
Microsoft, and more. It&#8217;s the first thing I install in a new browser. You can now
customize it to add additional toolbars and features not part of the default
installation. Say you want to use our Sphinn social media site.
<a href="http://www.groowe.com/toolbars/preview/65">This plug-in</a> lets you
add a Sphinn feature to Groowe. Check out the full list
<a href="http://www.groowe.com/toolbars/">here</a>.</p>
<p><b><a href="http://www.google.co.uk/intl/en/press/guides.html">Google
Publisher&#8217;s Guide</a></b></p>
<p>Want to understand how Google interacts with your web site but don&#8217;t want to
read all the help pages one-by-one? Google&#8217;s got a PDF compilation you can grab
at the URL above. You can also get publications on Google Book Search and Google
News.</p>
<p><b>Postscript</b>: Forgot one!</p>
<p><b><a href="http://alltop.com/">Alltop</a></b></p>
<p>Want top headlines on SEO from search marketing blogs across the web?
<a href="http://seo.alltop.com/">This new area</a> of Guy Kawasaki&#8217;s Alltop
service gives it to you. Alltop also has similar pages for a growing number of
subjects. Of course, places like <a href="http://originalsignal.com/">Original
Signal</a> have already been doing this, so check them out, too. For more SEO and
SEM blog compilations, also see these custom pages we offer:</p>
<blockquote>
<p>
<a href="http://www.google.com/ig/sharetab?stid=108171243734626665635e8384eaded2f40b4b6faaf7873b0c61f">
iGoogle Search News Tab</a><br />
<a href="http://cm.my.yahoo.com/add/page?id=myy_684e09ddf01e7549">My Yahoo
Search News Page</a><br />
<a href="http://www.live.com/?addTemplate=27cd875d-97c7-47d5-89a4-196fde72e94a">
Windows Live Search News Page</a><br />
<a href="http://eco.netvibes.com/tabs/206425/search-engine-news">Netvibes
Search News Tab</a><br />
<a href="http://www.pageflakes.com/Community/Pages/Page.aspx?moduleKey=345460">
Pageflakes Search News Page</a></p>
</blockquote>
<p>Also look in our <a href="http://searchengineland.com/#blogroll">Search
Engine Land blogroll</a>, scroll to the blog compilations area, and you&#8217;ll find more
all-in-one style headline pages. Ah, heck, I&#8217;ll just copy and paste &#8216;em!</p>
<blockquote>
<p><b>Blog Compilations<br />
</b><a href="http://searchengineland.com/searchcap.php">SearchCap: Daily
Search Recap</a><br />
<a href="http://searchengineland.com/searchcap.php">SearchMonth</a><br />
<a href="http://seo.alltop.com/">Alltop: SEO</a><br />
<a href="http://seo.originalsignal.com/">Original Signal: SEO</a><br />
<a href="http://www.aimclearblog.com/dailyCrawl/">SEOBlog dailyCrawl</a><br />
<a href="http://seomash.com/">SEOmash</a><br />
<a href="http://www.searchbrains.com/">Search Brains</a><br />
<a href="http://socialblogroll.com/">SocialBlogroll</a><br />
<a href="http://www.yellowhousehosting.com/resources/blog/this-week-in-seo/">
This Week In SEO</a><br />
<a href="http://www.topix.net/business/search-engines/">Topix: Search Engines</a><br />
<a href="http://www.toprankblog.com/search-marketing-blogs/">TopRank Big List
Of Search Marketing Blog</a></p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/from-my-inbox-some-search-tools-to-check-out-13948/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WWW2008: Search Research Paper Roundup</title>
		<link>http://searchengineland.com/www2008-search-research-paper-roundup-13879</link>
		<comments>http://searchengineland.com/www2008-search-research-paper-roundup-13879#comments</comments>
		<pubDate>Mon, 28 Apr 2008 13:48:46 +0000</pubDate>
		<dc:creator>Danny Sullivan</dc:creator>
				<category><![CDATA[Search Resources]]></category>
		<category><![CDATA[Stats: Search Behavior]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/www2008-search-research-paper-roundup-13879.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>
A variety of interesting research papers on search have come out of <a href="http://www2008.org/">
WWW2008</a>, the 17th International World Wide Web Conference. Some I&#8217;ve blogged
already. Below is a
rundown on those and
<a href="http://www2008.org/program/program-RefereedPapers.html">some other papers</a>
that may be of interest. For the attention-challenged, I&#8217;ve also included my now
patented &quot;Twitter&quot; summary for some of the interesting or more
accessible papers, to tell you the highlights.</p>
<p><span id="more-13879"></span></p>
<p>
<b><a href="http://www2008.org/papers/fp506.html">PageRank for Image Search</a></b><br />
Google</p>
<blockquote>
<p>
<b>Abstract:</b> In this paper, we cast the image-ranking problem into the task
of identifying &quot;authority&quot; nodes on an inferred visual similarity graph and
propose an algorithm to analyze the visual link structure that can be created
among a group of images. Through an iterative procedure based on the PageRank
computation, a numerical weight is assigned to each image; this measures its
relative importance to the other images being considered. The incorporation of
visual signals in this process differs from the majority of large-scale
commercial-search engines in use today. Commercial search-engines often solely
rely on the text clues of the pages in which images are embedded to rank images,
and often entirely ignore the content of the images themselves as a ranking
signal. To quantify the performance of our approach in a real-world system, we
conducted a series of experiments based on the task of retrieving images for
2000 of the most popular products queries. Our experimental results show
significant improvement, in terms of user satisfaction and relevancy, in
comparison to the most recent Google Image Search results.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798624057">Danny&#8217;s Twitter
Summary</a>: Google finds way to rank images better by virtual links of
similarities.</p>
<p>
See also the Search Engine Land story:
<a href="http://searchengineland.com/080428-052458.php">Google Paper: Better
Image Search Though VisualRank / Image Rank</a></p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp823.html">Spatial Variation in Search
Engine Queries</a></b><br />
Yahoo &amp; Cornell University</p>
<blockquote>
<p>
<b>Abstract: </b>Local aspects of Web search &#8212; associating Web content and
queries with geography &#8212; is a topic of growing interest. However, the
underlying question of how spatial variation is manifested in search queries is
still not well understood. Here we develop a probabilistic framework for
quantifying such spatial variation; on complete Yahoo! query logs, we find that
our model is able to localize large classes of queries to within a few miles of
their natural centers based only on the distribution of activity for the query.
Our model provides not only an estimate of a query&#8217;s geographic center, but also
a measure of its spatial dispersion, indicating whether it has highly local
interest or broader regional or national appeal. We also show how variations on
our model can track geographically shifting topics over time, annotate a map
with each location&#8217;s &quot;distinctive queries,&quot; and delineate the &quot;spheres of
influence&quot; for competing queries in the same general domain.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798627508">Danny&#8217;s Twitter
Summary</a>: Yahoo shows how any query can have a geographic center.</p>
<p>
See also the Search Engine Land story:
<a href="http://searchengineland.com/080428-061418.php">Yahoo Paper: Finding The
Local &quot;Center&quot; Of Search Queries</a></p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp801.html">Mining the Search Trails of
Surfing Crowds: Identifying Relevant Websites From User Activity</a></b><br />
Microsoft Research</p>
<blockquote>
<p>
<b>Abstract:</b> The paper proposes identifying relevant information sources
from the history of combined searching and browsing behavior of many Web users.
While it has been previously shown that user interactions with search engines
can be employed to improve document ranking, browsing behavior that occurs
beyond search result pages has been largely overlooked in prior work. The paper
demonstrates that users&#8217; post-search browsing activity strongly reflects
implicit endorsement of visited pages, which allows estimating topical relevance
of Web resources by mining large-scale datasets of search trails. We present
heuristic and probabilistic algorithms that rely on such datasets for suggesting
authoritative websites for search queries. Experimental evaluation shows that
exploiting complete post-search browsing trails outperforms alternatives in
isolation (e.g., clickthrough logs), and yields accuracy improvements when
employed as a feature in learning to rank for Web search.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798630821">Danny&#8217;s Twitter
Summary</a>: Microsoft studies using surfing patterns after a search to improve
ranking.</p>
<p>
See also the Search Engine Land story:
<a href="http://searchengineland.com/080428-064639.php">Microsoft Paper:
Improving Search Results By Mining Web Surfing Activity</a></p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp826.html">Genealogical Trees on the Web:
A Search Engine User Perspective</a><br />
</b>Yahoo Research &amp; Federal University of Minas Gerais</p>
<blockquote>
<p>
<b>Abstract:</b> This paper presents an extensive study about the evolution of
textual content on the Web, which shows how some new pages are created from
scratch while others are created using already existing content. We show that a
significant fraction of the Web is a byproduct of the latter case. We introduce
the concept of Web genealogical tree, in which every page in a Web snapshot is
classified into a component. We study in detail these components, characterizing
the copies and identifying the relation between a source of content and a search
engine, by comparing page relevance measures, documents returned by real queries
performed in the past, and click-through data. We observe that sources of copies
are more frequently returned by queries and more clicked than other documents.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798636381">Danny&#8217;s Twitter
Summary</a>: Yahoo paper on how 1/4 new web docs have content from existing
ones. Insight into scrapers &amp; duplicate contet [sic]? But based on Spanish docs.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp509.html">Query-Sets:
Using Implicit Feedback and Query Patterns to Organize Web Documents</a><br />
</b>Yahoo Research &amp; University Pompeu Fabra</p>
<blockquote>
<p>
<b>Abstract:</b> In this paper we present a new document representation model
based on implicit user feedback obtained from search engine queries. The main
objective of this model is to achieve better results in non-supervised tasks,
such as clustering and labeling, through the incorporation of usage data
obtained from search engine queries. This type of model allows us to discover
the motivations of users when visiting a certain document. The terms used in
queries can provide a better choice of features, from the user&#8217;s point of view,
for summarizing the Web pages that were clicked from these queries. In this work
we extend and formalize as &quot;query model&quot; an existing but not very well known
idea of &quot;query view&quot; for document representation. Furthermore, we create a novel
model based on &quot;frequent query patterns&quot; called the &quot;query-set model&quot;. Our
evaluation shows that both &quot;query-based&quot; models outperform the vector-space
model when used for clustering and labeling documents in a website. In our
experiments, the query-set model reduces by more than 90% the number of features
needed to represent a set of documents and improves by over 90% the quality of
the results. We believe that this can be explained because our model chooses
better features and provides more accurate labels according to the user&#8217;s
expectations.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798636381">Danny&#8217;s Twitter
Summary</a>: Yahoo defining documents by queries they might satisfy rather than
as containing individual words.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp956.html">Using the Wisdom of the Crowds
for Keyword Generation</a></b><br />
Microsoft Research</p>
<blockquote>
<p>
<b>Abstract: </b>In the sponsored search model, search engines are paid by
businesses that are interested in displaying ads for their site alongside the
search results. Businesses bid for keywords, and their ad is displayed when the
keyword is queried to the search engine. An important problem in this process is
emph{keyword generation}: given a business that is interested in launching a
campaign, suggest keywords that are related to that campaign. We address this
problem by making use of the query logs of the search engine. We identify
queries related to a campaign by exploiting the associations between queries and
URLs as they are captured by the user&#8217;s clicks. These queries form good keyword
suggestions since they capture the &#8220;wisdom of the crowd&#8221; as to what is related
to a site. We formulate the problem as a semi-supervised learning problem, and
propose algorithms within the Markov Random Field model. We perform experiments
with real query logs, and we demonstrate that our algorithms scale to large
query logs and produce meaningful results.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798661325">Danny&#8217;s Twitter
Summary</a>: Microsoft paper looks at how keyword suggestions for advertisers
can be generated by monitoring click logs.</p>
</blockquote>
<p>
<b>
<a href="http://www2008.org/papers/fp618.html">Performance of Compressed
Inverted List Caching in Search Engines</a><br />
</b>Microsoft &amp; Polytechnic University</p>
<blockquote>
<p>
<b>Abstract:</b>
Due to the rapid growth in the size of the web, web search engines are facing
enormous performance challenges. The larger engines in particular have to be
able to process tens of thousands of queries per second on tens of billions of
documents, making query throughput a critical issue. To satisfy this heavy
workload, search engines use a variety of performance optimizations including
index compression, caching, and early termination. We focus on two techniques,
inverted index compression and index caching, which play a crucial rule in web
search engines as well as other high-performance information retrieval systems.
We perform a comparison and evaluation of several inverted list compression
algorithms, including new variants of existing algorithms that have not been
studied before. We then evaluate different inverted list caching policies on
large query traces, and finally study the possible performance benefits of
combining compression and caching. The overall goal of this paper is to provide
an updated discussion and evaluation of these two techniques, and to show how to
select the best set of approaches and settings depending on parameter such as
disk speed and main memory cache size.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798671639">Danny&#8217;s Twitter
Summary</a>: Microsoft paper with nice background on how search engines quickly
find and cache results</p>
</blockquote>
<p>
<b>
<a href="http://www2008.org/papers/fp431.html">Improving Relevance Judgment of
Web Search Results with Image Excerpts</a></b><br />
Microsoft Research Asia</p>
<blockquote>
<p>
<b>Abstract: </b>Current web search engines return result pages containing mostly text summary
even though the matched web pages may contain informative pictures. A text
excerpt (i.e. snippet) is generated by selecting keywords around the matched
query terms for each returned page to provide context for user&#8217;s relevance
judgment. However, in many scenarios, we found that the pictures in web pages,
if selected properly, could be added into search result pages and provide richer
contextual description because a picture is worth a thousand words. Such new
summary is named as image excerpts. By well designed user study, we demonstrate
image excerpts can help users make much quicker relevance judgment of search
results for a wide range of query types. To implement this idea, we propose a
practicable approach to automatically generate image excerpts in the result
pages by considering the dominance of each picture in each web page and the
relevance of the picture to the query. We also outline an efficient way to
incorporate image excerpts in web search engines. Web search engines can adopt
our approach by slightly modifying their index and inserting a few low cost
operations in their workflow. Our experiments on a large web dataset indicate
the performance of the proposed approach is very promising.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798675330">Danny&#8217;s Twitter
Summary</a>: Microsoft paper on finding page&#8217;s dominant image using use next to
search listing to improve relevancy of results</p>
</blockquote>
<p><a href="http://www2008.org/papers/fp648.html">Tag-Based Social Interest
Discovery</a><br />
Yahoo</p>
<blockquote>
<p><b>Abstract: </b>The success and popularity of social network systems, such
as del.icio.us, Facebook, MySpace, and YouTube, have generated many interesting
and challenging problems to the research community. Among others, discovering
social interests shared by groups of users is very important because it helps to
connect people with common interests and encourages people to contribute and
share more contents. The main challenge to solving this problem comes from the
diffi- culty of detecting and representing the interest of the users. The
existing approaches are all based on the online connections of users and so
unable to identify the common interest of users who have no online connections.
In this paper, we propose a novel social interest discovery approach based on
user-generated tags. Our approach is motivated by the key observation that in a
social network, human users tend to use descriptive tags to annotate the
contents that they are interested in. Our analysis on a large amount of
real-world traces reveals that in general, user-generated tags are consistent
with the web content they are attached to, while more concise and closer to the
understanding and judgments of human users about the content. Thus, patterns of
frequent co-occurrences of user tags can be used to characterize and capture
topics of user interests. We have developed an Internet Social Interest
Discovery system, ISID, to discover the common user interests and cluster users
and their saved URLs by different interest topics. Our evaluation shows that
ISID can effectively cluster similar documents by interest topics and discover
user communities with common interests no matter if they have any online
connections.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798684694">Danny&#8217;s Twitter
Summary</a>: Yahoo paper on how human tags can be a more relevant way to
determine the main topic of a page than keyword analysis.</p>
</blockquote>
<p>
<a href="http://www2008.org/papers/fp367.html">Learning to Rank Relational
Objects and Its Application to Web Search</a><br />
Microsoft Research Asia, Tsinghua University, Peking University</p>
<blockquote>
<p>
<b>Abstract:</b> Learning to rank is a new statistical learning technology on
creating a ranking model for sorting objects. The technology has been
successfully applied to web search, and is becoming one of the key machineries
for building search engines. Existing approaches to learning to rank, however,
did not consider the cases in which there exists relationship between the
objects to be ranked, despite of the fact that such situations are very common
in practice. For example, in web search, given a query certain relationships
usually exist among the the retrieved documents, e.g., URL hierarchy,
similarity, etc., and sometimes it is necessary to utilize the information in
ranking of the documents. This paper addresses the issue and formulates it as a
novel learning problem, referred to as, &#8216;learning to rank relational objects&#8217;.
In the new learning task, the ranking model is defined as a function of not only
the contents (features) of objects but also the relations between objects. The
paper further focuses on one setting of the learning problem in which the way of
using relation information is predetermined. It formalizes the learning task as
an optimization problem in the setting. The paper then proposes a new method to
perform the optimization task, particularly an implementation based on SVM.
Experimental results show that the proposed method outperforms the baseline
methods for two ranking tasks (Pseudo Relevance Feedback and Topic Distillation)
in web search, indicating that the proposed method can indeed make effective use
of relation information and content information in ranking.</p>
<p>
<a href="http://twitter.com/dannysullivan/statuses/798692177">Danny&#8217;s Twitter
Summary</a>: Microsoft paper on &quot;relationship&quot; ranking such as parent child
documents, topic similarities and related relevancy.</p>
</blockquote>
<p>
<b>
<a href="http://www2008.org/papers/fp233.html">Modeling Anchor Text and
Classifying Queries to Enhance Web Document Retrieval</a></b><br />
University of Tsukuba</p>
<blockquote>
<p>
<b>Abstract: </b>Several types of queries are widely used on the World Wide Web and the
expected retrieval method can vary depending on the query type. We propose a
method for classifying queries into informational and navigational types.
Because terms in navigational queries often appear in anchor text for links to
other pages, we analyze the distribution of query terms in anchor texts on the
Web for query classification purposes. While content-based retrieval is
effective for informational queries, anchor-based retrieval is effective for
navigational queries. Our retrieval system combines the results obtained with
the content-based and anchor-based retrieval methods, in which the weight for
each retrieval result is determined automatically depending on the result of the
query classification. We also propose a method for improving anchor-based
retrieval. Our retrieval method, which computes the probability that a document
is retrieved in response to the given query, identifies synonyms of query terms
in the anchor texts on the Web and uses these synonyms for smoothing purposes in
the probability estimation. We use the NTCIR test collections and show the
effectiveness of individual methods and the entire Web retrieval system
experimentally.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp592.html">Unsupervised Query
Segmentation using Generative Language Models and Wikipedia</a></b><br />
Yahoo &amp; University of Illinois at Urbana-Champaign</p>
<blockquote>
<p>
<b>Abstract:</b> In this paper, we propose a novel unsupervised approach to
query segmentation, an important task in Web search. We use a generative query
model to recover a query&#8217;s underlying concepts that compose its original
segmented form. The model&#8217;s parameters are estimated using an
expectation-maximization (EM) algorithm, optimizing the minimum description
length objective function on a partial corpus that is specific to the query. To
augment this unsupervised learning, we incorporate evidence from Wikipedia.
Experiments show that our approach dramatically improves performance over the
traditional approach that is based on mutual information, and produces
comparable results with a supervised method. In particular, the basic generative
language model contributes a 7.4% improvement over the mutual information based
method (measured by segment F1 on the Intersection test set). EM optimization
further improves the performance by 14.3%. Additional knowledge from Wikipedia
provides another improvement of 24.3%, adding up to a total of 46% improvement
(from 0.530 to 0.774).</p>
</blockquote>
<p>
<b>
<a href="http://www2008.org/papers/fp840.html">Knowledge Sharing and Yahoo
Answers: Everyone Knows Something</a><br />
</b>University Of Michigan</p>
<blockquote>
<p>
<b>Abstract: </b>Yahoo Answers (YA) is a large and diverse question-answer
forum, acting not only as a medium for sharing technical knowledge, but as a
place where one can seek advice, gather opinions, and satisfy one&#8217;s curiosity
about a countless number of things. In this paper, we seek to understand YA&#8217;s
knowledge sharing and activity. We analyze the forum categories and cluster them
according to content characteristics and patterns of interaction among the
users. While interactions in some categories resemble expertise sharing forums,
others incorporate discussion, everyday advice, and support. With such a
diversity of categories in which one can participate, we find that some users
focus narrowly on specific topics, while others participate across categories.
This not only allows us to map related categories, but to characterize the
entropy of the users&#8217; interests. We find that lower entropy correlates with
receiving higher answer ratings, but only for categories where factual expertise
is primarily sought after. We combine both user attributes and answer
characteristics to predict, within a given category, whether a particular answer
will be chosen as the best answer by the asker.</p>
</blockquote>
<p>
<b>
<a href="http://www2008.org/papers/fp625.html">A Graph-Theoretic Approach to
Webpage Segmentation</a></b><br />
Yahoo Research</p>
<blockquote>
<p>
<b>Abstract: </b>We consider the problem of segmenting a webpage into visually and semantically
cohesive pieces. Our approach is based on formulating an appropriate
optimization problem on weighted graphs, where the weights capture if two nodes
in the DOM tree should be placed together or apart in the segmentation; we
present a learning framework to learn these weights from manually labeled data
in a principled manner. Our work is a significant departure from previous
heuristic and rule-based solutions to the segmentation problem. The results of
our empirical analysis bring out interesting aspects of our framework, including
variants of the optimization problem and the role of learning.</p>
</blockquote>
<p><b><a href="http://www2008.org/papers/fp175.html">Ranking Refinement
and Its Application to Information Retrieval</a><br />
</b>Microsoft Research Asia &amp; Michigan State University</p>
<blockquote>
<p><b>Abstract: </b>We consider the problem of ranking refinement, i.e., to
improve the accuracy of an existing ranking function with a small set of labeled
instances. We are, particularly, interested in learning a better ranking
function using two complementary sources of information, ranking information
given by the existing ranking function (i.e., the base ranker) and that obtained
from users&#8217; feedbacks. This problem is very important in information retrieval
where feedbacks are gradually collected. The key challenge in combining the two
sources of information arises from the fact that the ranking information
presented by the base ranker tends to be imperfect and the ranking information
obtained from users&#8217; feedbacks tends to be noisy. We present a novel boosting
algorithm for ranking refinement that can effectively leverage the uses of the
two sources of information. Our empirical study shows that the proposed
algorithm is effective for ranking refinement, and furthermore it significantly
outperforms the baseline algorithms that incorporate the outputs from the base
ranker as an additional feature.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp736.html">Contextual Advertising by
Combining Relevance with Click Feedback</a></b><br />
Yahoo Research</p>
<blockquote>
<p>
<b>Abstract:</b> Contextual advertising supports much of the Web&#8217;s ecosystem
today. User experience and revenue (shared by the site publisher ad the ad
network) depend on the relevance of the displayed ads to the page content. As
with other document retrieval systems, relevance is provided by scoring the
match between individual ads (documents) and the content of the page where the
ads are shown (query). In this paper we show how this match can be improved
significantly by augmenting the ad-page scoring function with extra parameters
from a logistic regression model on the words in the pages and ads. A key
property of the proposed model is that it can be mapped to standard cosine
similarity matching and is suitable for efficient and scalable implementation
over inverted indexes. The model parameter values are learnt from logs
containing ad impressions and clicks, with shrinkage estimators being used to
combat sparsity. To scale our computations to train on an extremely large
training corpus consisting of several gigabytes of data, we parallelize our
fitting algorithm in a Hadoop framework. Experimental evaluation is provided
showing improved click prediction over a holdout set of impression and click
events from a large scale real-world ad placement engine. Our best model
achieves a 25% lift in precision relative to a traditional information retrieval
model which is based on cosine similarity, for recalling 10% of the clicks in
our test data.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp224.html">Learning to Classify Short and
Sparse Text &amp; Web with Hidden Topics from Large-Scale Data Collections</a><br />
</b>Tohoku University &amp; Japan Advanced Institute of Science &amp; Technology</p>
<blockquote>
<p>
<b>Abstract:</b> This paper presents a general framework for building
classifiers that deal with short and sparse text &amp; Web segments by making the
most of hidden topics discovered from large-scale data collections. The main
motivation of this work is that many classification tasks working with short
segments of text &amp; Web, such as search snippets, forum &amp; chat messages, blog &amp;
news feeds, product reviews, and book &amp; movie summaries, fail to achieve high
accuracy due to the data sparseness. We, therefore, come up with an idea of
gaining external knowledge to make the data more related as well as expand the
coverage of classifiers to handle future data better. The underlying idea of the
framework is that for each classification task, we collect a large-scale
external data collection called &#8220;universal dataset&#8221;, and then build a
classifier on both a (small) set of labeled training data and a rich set of
hidden topics discovered from that data collection. The framework is general
enough to be applied to different data domains and genres ranging from Web
search results to medical text. We did a careful evaluation on several hundred
megabytes of Wikipedia (30M words) and MEDLINE (18M words) with two tasks: &#8220;Web
search domain disambiguation&#8221; and &#8220;disease categorization for medical text&#8221;,
and achieved significant quality enhancement.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp373.html">Generating Diverse and
Representative Image Search Results for Landmarks</a><br />
</b>Yahoo &amp; Columbia University</p>
<blockquote>
<p>
<b>Abstract:</b> Can we leverage the community-contributed collections of rich
media on the web to automatically generate representative and diverse views of
the world&#8217;s landmarks? We use a combination of context- and content-based tools
to generate representative sets of images for location-driven features and
landmarks, a common search task. To do that, we using location and other
metadata, as well as tags associated with images, and the images&#8217; visual
features. We present an approach to extracting tags that represent landmarks. We
show how to use unsupervised methods to extract representative views and images
for each landmark. This approach can potentially scale to provide better search
and representation for landmarks, worldwide. We evaluate the system in the
context of image search using a real-life dataset of 110,000 images from the San
Francisco area.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp302.html">Flickr Tag Recommendation
based on Collective Knowledge</a><br />
</b>Yahoo Research</p>
<blockquote>
<p>
<b>Abstract:</b> Online photo services such as Flickr and Zooomr allow users to
share their photos with family, friends, and the online community at large. An
important facet of these services is that users manually annotate their photos
using so called tags, which describe the contents of the photo or provide
additional contextual and semantical information. In this paper we investigate
how we can assist users in the tagging phase. The contribution of our research
is twofold. We analyse a representative snapshot of Flickr and present the
results by means of a tag characterisation focussing on how users tags photos
and what information is contained in the tagging. Based on this analysis, we
present and evaluate tag recommendation strategies to support the user in the
photo annotation task by recommending a set of tags that can be added to the
photo. The results of the empirical evaluation show that we can effectively
recommend relevant tags for a variety of photos with different levels of
exhaustiveness of original tagging.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp846.html">Deciphering Mobile Search
Patterns: A Study of Yahoo! Mobile Search Queries</a></b><br />
Yahoo</p>
<blockquote>
<p>
<b>Abstract: </b>In this paper we study the characteristics of search queries
submitted from mobile devices using various Yahoo! mobile oneSearch applications
during a 2 months period in the second half of 2007, and report the query
patterns derived from 20 million English sample queries submitted by users in
US, Canada, Europe, and Asia. We examine the query distribution and topical
categories the queries belong to in order to find new trends. We compare and
contrast the search patterns between US vs international queries, and between
queries from various search interfaces (XHTML/WAP, java widgets, and SMS). We
also compare our results with previous studies wherever possible, either to
confirm previous findings, or to find interesting differences in the query
distribution and pattern.</p>
<p>
NOTE: Looks interesting, but there&#8217;s no link from the overview page to the
actual research document, at the moment.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp865.html">IRLbot: Scaling to 6 Billion
Pages and Beyond</a><br />
</b>Texas A&amp;M University</p>
<blockquote>
<p>
<b>Abstract:</b> This paper shares our experience in designing a web crawler
that can download billions of pages using a single-server implementation and
models its performance. We show that with the quadratically increasing
complexity of verifying URL uniqueness, BFS crawl order, and fixed per-host
rate-limiting, current crawling algorithms cannot effectively cope with the
sheer volume of URLs generated in large crawls, highly-branching spam,
legitimate multi-million-page blog sites, and infinite loops created by
server-side scripts. We offer a set of techniques for dealing with these issues
and test their performance in an implementation we call IRLbot. In our recent
experiment that lasted $41$ days, IRLbot running on a single server successfully
crawled $6.3$ billion valid HTML pages ($7.6$ billion connection requests) and
sustained an average download rate of $319$ mb/s ($1,789$ pages/s). Unlike our
prior experiments with algorithms proposed in related work, this version of
IRLbot did not experience any bottlenecks and successfully handled content from
over $117$ million hosts, parsed out $394$ billion links, and discovered a
subset of the web graph with $41$ billion unique nodes.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp597.html">Recrawl Scheduling Based on
Information Longevity</a><br />
</b>Yahoo Research &amp; Carnegie Mellon University</p>
<blockquote>
<p>
<b>Abstract:</b> It is crucial for a web crawler to distinguish between
ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is
usually not worth crawling, because by the time it reaches the index it is no
longer representative of the web page from which it was acquired. On the other
hand, content that persists across multiple page updates (e.g., recent blog
postings) may be worth acquiring, because it matches the page&#8217;s true content for
a sustained period of time. In this paper we characterize the longevity of
information found on the web, via both empirical measurements and a generative
model that coincides with these measurements. We then develop new recrawl
scheduling policies that take longevity into account. As we show via experiments
over real web data, our policies obtain better freshness at lower cost, compared
with previous approaches.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp454.html">iRobot: An Intelligent Crawler
for Web Forums</a></b><br />
Microsoft Research Asia</p>
<blockquote>
<p>
<b>Abstract: </b>We study in this paper the Web forum crawling problem, which is
a very fundamental step in many Web applications, such as search engine and Web
data mining. As a typical user-created content (UCC), Web forum has become an
important resource on the Web due to its rich information contributed by
millions of Internet users every day. However, Web forum crawling is not a
trivial problem due to the in-depth link structures, the large amount of
duplicate pages, as well as many invalid pages caused by login failure issues.
In this paper, we propose and build a prototype of an intelligent forum crawler,
iRobot, which has intelligence to understand the content and the structure of a
forum site, and then decide how to choose traversal paths among different kinds
of pages. To do this, we first randomly sample (download) a few pages from the
target forum site, and introduce the page content layout as the characteristics
to group those pre-sampled pages and re-construct the forum&#8217;s sitemap. After
that, we select an optimal crawling path which only traverses informative pages
and skips invalid and duplicate ones. The extensive experimental results on
several forums show the performance of our system in the following aspects: 1)
Effectiveness – Compared to a generic crawler, iRobot significantly decreases
the duplicate and invalid pages; 2) Efficiency – With a small cost of
pre-sampling a few pages for learning the necessary knowledge, iRobot saves
substantial network bandwidth and storage as it only fetches informative pages
from a forum site; and 3) Long threads that are divided into multiple pages can
be re-concatenated and archived as a whole thread, which is of great help for
further indexing and data mining.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp183.html">Analyzing Search Engine
Advertising: Firm Behavior and Cross-Selling in Electronic Markets</a><br />
</b>New York University</p>
<blockquote>
<p>
<b>Abstract:</b> The phenomenon of sponsored search advertising is gaining
ground as the largest source of revenues for search engines. Firms across
different industries have are beginning to adopt this as the primary form of
online advertising. This process works on an auction mechanism in which
advertisers bid for different keywords, and final rank for a given keyword is
allocated by the search engine. But how different are firm&#8217;s actual bids from
their optimal bids? Moreover, what are other ways in which firms can potentially
benefit from sponsored search advertising? Based on the model and estimates from
prior work [10], we conduct a number of policy simulations in order to
investigate to what extent an advertiser can benefit from bidding optimally for
its keywords. Further, we build a Hierarchical Bayesian modeling framework to
explore the potential for cross-selling or spillovers effects from a given
keyword advertisement across multiple product categories, and estimate the model
using Markov Chain Monte Carlo (MCMC) methods. Our analysis suggests that
advertisers are not bidding optimally with respect to maximizing profits. We
conduct a detailed analysis with product level variables to explore the extent
of cross-selling opportunities across different categories from a given keyword
advertisement. We find that there exists significant potential for cross-selling
through search keyword advertisements in that consumers often end up buying
products from other categories in addition to the product they were searching
for. Latency (the time it takes for consumer to place a purchase order after
clicking on the advertisement) and the presence of a brand name in the keyword
are associated with consumer spending on product categories that are different
from the one they were originally searching for on the Internet.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp645.html">Online Learning from Click
Data for Sponsored Search</a></b><br />
Yahoo Research</p>
<blockquote>
<p>
<b>Abstract:</b> Sponsored search is one of the enabling technologies for
today&#8217;s Web search engines. It corresponds to matching and showing ads related
to the user query on the search engine results page. Users are likely to click
on topically related ads and the advertisers pay only when a user clicks on
their ad. Hence, it is important to be able to predict if an ad is likely to be
clicked, and maximize the number of clicks. We investigate the sponsored search
problem from a machine learning perspective with respect to three main
sub-problems: how to use click data for training and evaluation, which learning
framework is more suitable for the task, and which features are useful for
existing models. We perform a large scale evaluation based on data from a
commercial Web search engine. Results show that it is possible to learn and
evaluate directly and exclusively on click data encoding pairwise preferences
following simple and conservative assumptions. We find that online multilayer
perceptron learning, based on a small set of features representing content
similarity of different kinds, significantly outperforms an information
retrieval baseline and other learning models, providing a suitable framework for
the sponsored search task.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp457.html">Automatic Online News Issue
Construction in Web Environment</a><br />
</b>Tsinghua University</p>
<blockquote>
<p>
<b>Abstract:</b> In many cases, rather than a keyword search, people intend to
see what is going on through the Internet. Then the integrated comprehensive
information on news topics is necessary, which we called news issues, including
the background, history, current progress, different opinions and discussions,
etc. Traditionally, news issues are manually generated by website editors. It is
quite a time-consuming hard work, and hence real-time update is difficult to
perform. In this paper, a three-step automatic online algorithm for news issue
construction is proposed. The first step is a topic detection process, in which
newly appearing stories are clustered into new topic candidates. The second step
is a topic tracking process, where those candidates are compared with previous
topics, either merged into old ones or generating a new one. In the final step,
news issues are constructed by the combination of related topics and updated by
the insertion of new topics. An automatic online news issue construction process
under practical Web circumstances is simulated to perform news issue
construction experiments. F-measure of the best results is either above (topic
detection) or close to (topic detection and tracking) 90%. Four news issue
construction results are successfully generated in different time granularities:
one meets the needs like &quot;what&#8217;s new&quot;, and the other three will answer questions
like &quot;what&#8217;s hot&quot; or &quot;what&#8217;s going on&quot;. Through the proposed algorithm, news
issues can be effectively and automatically constructed with real-time update,
and lots of human efforts will be released from tedious manual work.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp803.html">Finding the Right Facts in the
Crowd: Factoid Question Answering over Social Media</a><br />
</b>Georgia Institute Of Technology &amp; Emory University</p>
<blockquote>
<p>
<b>Abstract:</b> Community Question Answering has emerged as a popular and
effective paradigm for a wide range of information needs. For example, to find
out an obscure piece of trivia, it is now possible and even very effective to
post a question on a popular community QA site such as Yahoo! Answers, and to
rely on other users to provide answers, often within minutes. The importance of
such community QA sites is magnified as they create archives of millions of
questions and hundreds of millions of answers, many of which are invaluable for
the information needs of other searchers. However, to make this immense body of
knowledge accessible, effective answer retrieval is required. In particular, as
any user can contribute an answer to a question, the majority of the content
reflects personal, often unsubstantiated opinions. A ranking that combines both
relevance and quality is required to make such archives usable for factual
information retrieval. This task is challenging, as the structure and the
contents of community QA archives differ significantly from the web setting. To
address this problem we present a general ranking framework for factual
information retrieval from social media. Results of a large scale evaluation
demonstrate that our method is highly effective at retrieving well-formed,
factual answers to questions, as evaluated on a standard factoid QA benchmark.
We also show that our learning framework can be tuned with the minimum of manual
labeling. Finally, we provide result analysis to gain deeper understanding of
which features are significant for social media search and retrieval. Our system
can be used as a crucial building block for combining results from a variety of
social media content with general web search results, and to better integrate
social media content for effective information access.</p>
</blockquote>
<p>
<b><a href="http://www2008.org/papers/fp810.html">Personalized Interactive
Faceted Search</a></b><br />
University Of California, Santa Cruz &amp; McGill University</p>
<blockquote>
<p>
<b>Abstract:</b> Faceted search is becoming a popular method to allow users to
interactively search and navigate complex information spaces. A faceted search
system presents users with key-value metadata that is used for query refinement.
While popular in e-commerce and digital libraries, not much research has been
conducted on which metadata to present to a user in order to improve the search
experience. Nor are there repeatable benchmarks for evaluating a faceted search
engine. This paper proposes the use of collaborative filtering and
personalization to customize the search interface to each user&#8217;s behavior. This
paper also proposes a utility based framework to evaluate the faceted interface.
In order to demonstrate these ideas and better understand personalized faceted
search, several faceted search algorithms are proposed and evaluated using the
novel evaluation methodology.</p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/www2008-search-research-paper-roundup-13879/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Browser Search Engine&#8221; ChunkIt Launches</title>
		<link>http://searchengineland.com/browser-search-engine-chunkit-launches-13816</link>
		<comments>http://searchengineland.com/browser-search-engine-chunkit-launches-13816#comments</comments>
		<pubDate>Sun, 20 Apr 2008 13:14:32 +0000</pubDate>
		<dc:creator>Greg Sterling</dc:creator>
				<category><![CDATA[Search Engines: Other Search Engines]]></category>
		<category><![CDATA[Search Resources]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/browser-search-engine-chunkit-launches-13816.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>On Friday <a href="http://www.tigerlogic.com/ChunkIt/">ChunkIt</a> launched in a private beta. ChunkIt is a search download that sits on top of existing results and enables users to preview them before clicking. Relevant &#8220;chunks&#8221; of content are called out in a separate pane that splits the screen between the engine and the ChunkIt page.</p>
<p>At this stage it sounds a lot like other search plug-ins, toolbars, or search &#8220;side bars&#8221; that have come before. However, CEO Carlton Baab objects to the concept that this is simply another plug-in or search toolbar. He characterizes it instead as a &#8220;search engine for the browser.&#8221;</p>
<p><span id="more-13816"></span>
In the past, there have been numerous efforts to help users cull through search results via plug-ins, toolbars, or search add-ons. Recent entries include <a href="http://searchengineland.com/070810-193355.php">Mahalo Follow</a> and <a href="http://surfcanyon.com/search/extension.jsp">Surf Canyon</a>, a plug-in that &#8220;recommends&#8221; related search results. Then there are <a href="https://addons.mozilla.org/en-US/firefox/browse/type:4">numerous Firefox extensions</a> that augment or annotate search results. There have also been many third-party efforts to enable people to &#8220;preview&#8221; results prior to clicking (e.g., <a href="http://snap.com">Snap </a>). And there&#8217;s <a href="http://searchengineland.com/080411-103706.php">an emerging category</a> of so-called &#8220;visual search engines,&#8221; which push that concept further.</p>
<p>Baab says, by contrast, that ChunkIt is a powerful search engine could stand on its own. But the company didn&#8217;t want to ask people, for obvious reasons, to try yet another new engine. So it adopted the current download/overlay strategy, which may change over time according to Baab.</p>
<p>An interesting <a href="http://gesterling.wordpress.com/2007/05/24/palore-adding-more-structure-branding-to-local-search/'">search plug-in in the local segment</a>, Palore changed its model because it found getting people to download the application very challenging. The company is now <a href="http://palore.com/">aggregating and syndicating local data</a> and reselling it to consumer destinations instead. Analytics firm Compete was at one time pushing <a href="http://tools.compete.com/">a toolbar</a>, which offers overlay information on search results about a range of topics. While the toolbar still exists, the company has stopped directly promoting it on its <a href="http://compete.com">home page</a>.</p>
<p>These examples suggest the challenge that ChunkIt faces to gain awareness ad adoption. The tool is in private beta so we haven&#8217;t yet had a chance to assess its value.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/browser-search-engine-chunkit-launches-13816/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Supplemental Results and Google&#8217;s Extended Databases</title>
		<link>http://searchengineland.com/supplemental-results-and-googles-extended-databases-11897</link>
		<comments>http://searchengineland.com/supplemental-results-and-googles-extended-databases-11897#comments</comments>
		<pubDate>Thu, 09 Aug 2007 05:40:19 +0000</pubDate>
		<dc:creator>Bill Slawski</dc:creator>
				<category><![CDATA[Google: SEO]]></category>
		<category><![CDATA[Google: Web Search]]></category>
		<category><![CDATA[Legal: Patents]]></category>
		<category><![CDATA[Search Resources]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/supplemental-results-and-googles-extended-databases-11897.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Until very recently, you might have seen a label next to a search result in Google that indicated it was a &#8220;supplemental&#8221; result.  A couple of patents from Google, one of which was granted this week and one from earlier this year, discuss how a search query might return results from an extended database that sound a lot like a supplemental results.</p>
<p><span id="more-11897"></span>
The Official Google Webmaster Central Blog announced that they would stop labeling their supplemental results in a post from July 31st, titled <a href="http://googlewebmastercentral.blogspot.com/2007/07/supplemental-goes-mainstream.html">Supplemental goes mainstream</a>.  The authors, Prashanth Koppula and Matt Cutts, tell us that the system for crawling and indexing supplemental results has been improved, and those results are fresher and more comprehensive than ever.</p>
<p>Danny wrote a detailed post on the same day &#8211; <a href="http://searchengineland.com/070731-215828.php">Google Dumps The Supplemental Results Label</a></p>
<p>The patents may provide some insight into how a supplemental or extended index works, and how partitions are used to speed up a search of extended results.</p>
<p>These patents don&#8217;t use the word &#8220;supplemental&#8221; but it is possible that they describe the way supplemental results work, or worked.  In the Webmaster Central post, we are told that supplemental results were introduced in 2003.  These patents were also originally filed in 2003.</p>
<p>What I found interesting about them is that they provide a view of how indexing could work in a search engine, and answer some questions such as: (1) when are extended database results triggered, (2) how search result numbers are estimated, and (3) why you sometimes see a link at the bottom of results that tell you there are more results that aren&#8217;t being shown, that you can see if you click upon that link.</p>
<p>Some information about the patents:</p>
<p><a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PTXT&#038;S1=7,254,580.PN.&#038;OS=pn/7,254,580&#038;RS=PN/7,254,580">System and method for selectively searching partitions of a database</a>
Invented by Kourosh Gharachorloo, Fay Wen Chang, Deborah Anne Wallach, Sanjay Ghemawat, and Jeffrey Dean
Assigned to Google
US Patent 7,254,580
Granted August 7, 2007
Filed: 	September 30, 2003</p>
<p>Abstract</p>
<blockquote>When a search query is received, a plurality of partition indexes are searched using the set of search terms in the search query. Each partition index corresponds to a partition of a document index. The search of each respective partition index identifies a subset of a plurality of document index sub-partitions corresponding to the respective partition index. Next, the search query is executed by only those document index sub-partitions identified by the subsets, thereby identifying documents that satisfy the search query. By using the partition index to reduce the number of document index sub-partitions searched while executing a search query, the execution of the search query is made more efficient.</blockquote>
<p><a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&#038;Sect2=HITOFF&#038;d=PALL&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&#038;r=1&#038;f=G&#038;l=50&#038;s1=7,174,346.PN.&#038;OS=PN/7,174,346&#038;RS=PN/7,174,346">System and method for searching an extended database</a>
Invented by Kourosh Gharachorloo, Fay Wen Chang, Deborah Anne Wallach, Sanjay Ghemawat, and Jeffrey Dean
Assigned to Google
US Patent 7,174,346
Granted February 6, 2007
Filed September 30, 2003</p>
<p>Abstract</p>
<blockquote>Once a search query is received from a user, a standard index is searched based on the search query. The standard index forms part of a set of replicated standard indexes having multiple instances of the standard index. A signal is then determined based on the search of the standard index. When the received signal meets predefined criteria, an extended index is searched. The extended index forms part of a set of extended indexes having at least one instance of the extended index. There are fewer instances of the extended index than instances of the standard index. Extended search results are then obtained from the extended index and at least a portion of the extended search results is transmitted towards a user.</blockquote>
<p>What follows is a walk through of the process of returning search results from the standard index and the extended extended index, when necessary.  Some of the alternative approaches mentioned in the patents aren&#8217;t covered or discussed in detail.  The processes described may be very different from the reality, but hopefully this view of an extended index will provide you with some insights into how documents could be indexed by a search engine, and give you a slightly different perspective on the process of returning results in response to a query.</p>
<p><b>Searching the cache and standard index</b></p>
<p>A searcher submits a query to the search engine, and the query is received at one of a number of datacenters and sent to one of the query servers at the datacenter.</p>
<p>The query server receives the query and sends it to a mixer. The mixer transmits the query to the cache, to search the cache for results. The mixer might first normalize and hash the search request.</p>
<p>A hash value representing the query is received by the cache, and the cache is searched.</p>
<p>If a match for the hash value is found, those results would be sent back to the mixer. Results might be a list of located documents, with or without snippets, or an indication that there were no results in the cache.</p>
<p>The mixer or query server receives that response and determines whether results were located. If there are results, and snippets weren&#8217;t returned with them, they may be requested from the cache, and if they aren&#8217;t in the cache, they might be requested from the standard document server.</p>
<p>If no results were located, then the query is sent to standard index server. The search request could be first transmitted to multiple standard balancers  (one within each partition) that transmit the search onward to the standard index server.</p>
<p>Each balancer transmits the search request to a set of standard index servers.</p>
<p>Each standard index server stores and searches one or more partitions of the standard index to produce a set of search results. Each balancer may send the search query to between ten and one hundred standard index servers, and each standard index server is set up to store and search multiple (e.g., two to ten) index sub-partitions.</p>
<p>When the query is received by the standard index servers, those are searched, and the results are sent back to the mixer. Those results could be a list of located documents or an indication that no results were found.</p>
<p>The mixer receives a response, and if no search results were located, notifies the searcher that there were none.</p>
<p>If search results were located, snippets might be requested from the standard document servers, or the results might be sent to the query server, which might request the snippets.</p>
<p>The standard document servers receive that request for snippets, generate them from the documents identified in the search results, and send the snippets back to the mixer.</p>
<p>The mixer then sends the results and snippets to the cache, where they are saved in memory for future searches for that query.</p>
<p>At this point, a decision needs to be made as to whether more results are needed.</p>
<p><b>Signals indicating whether or not a search of the extended index should be conducted</b></p>
<ul>
<li>Number of results &#8211; for instance, if there are less than ten results (and that is the signal threshold value)</li>
<li>Whether the amortized cost of performing the extended search is small, comparing the cost of performing the search to the quality of search results</li>
<li>Deciding if the user is not satisfied with the standard results returned from the standard index server by looking at something like when a user selects a &#8220;next set of results&#8221; button repetitively</li>
<li>When the query scores (frequency and PageRank) of the results are low on average</li>
<li>if the load on the extended index servers is low</li>
<li>If for a given query the cost is low (different queries have different costs), or</li>
<li>Any combination of these signals.</li>
</ul>
<p><b>How Estimates of the number of results might be calculated</b></p>
<p>An estimate might be calculated on the fly, while the search is being performed, based upon how frequently results are being obtained from the standard index servers.  For example, the estimate might be based upon a search of a small percentage of the full index &#8211; less than 10 percent, and perhaps even less than 2 percent.</p>
<p><b>Queries sent to the extended server</b></p>
<p>When the standard index and cache were searched, and there were enough results, as measured by the threshold values for the signals listed above, then results are sent to the searcher.</p>
<p>If not, then the query is sent to an extended mixer, and an extended cache is checked for results.  If enough results are received there, those are sent to the extended mixer, and extended search results, with associated snippets, are transmitted to the mixer from the extended mixer.</p>
<p>The mixer would take those results and aggregate them with any standard search results, if there were any.  Those would be sent to the query server, and then the searcher.</p>
<p>But, imagine that there weren&#8217;t any extended search results located in the extended cache. The search request may be sent to the extended index servers.</p>
<p><b>Filtering at the extended server</b></p>
<p>Like in the standard index, there are multiple extended balancers that transmit the search onward to the extended index servers.</p>
<p>Balancer procedures in the extended balancer use a balance filter to perform a lookup operation for each term in the received search query to locate corresponding information in the partition index.</p>
<p>The balancer filter uses the information in the partition index to produce a sub-partition map for each of the terms in the search query.</p>
<p>A map of the extended document index sub-partitions is produced for each term of the search query.</p>
<p>The map can be encoded a few different ways, including as a bit map.  The map would contain a bit for each sub-partition of the extended index partition serviced by the extended balancer, with a first value of the bit indicating that the term is found in at least one document in the corresponding sub-partition of the extended index, and a second value of the bit indicating that the term is not found in any document in the corresponding sub-partition of the extended index.</p>
<p><b>Using combined bit maps</b></p>
<p>Each term has a bit map made for it.  A combined map is made from the bit maps for each term in the query, using Boolean logic matchng what was used in the query itself.</p>
<p>In Google, by default, if you don&#8217;t use a boolean operator for your search, the search engine will attempt to perform a search using &#8220;AND&#8221; for all of the terms (or at least all of the non-stopwords).  But, you could use the &#8220;OR&#8221; operator in your search query, or place a minus sign in front of a term, indicating a &#8220;NOT&#8221; for that term.  The way the bit maps for each term would combine would be based upon your use of those Boolean operators.</p>
<p>This combined map would indicate the document index sub-partitions that may index one or more documents that satisfy the search query, and which document index sub-partitions don&#8217;t.</p>
<p>The query is sent only to the extended document index sub-partitions indicated by the combined map as potentially indexing documents combining the search query.</p>
<p>By limiting, or filtering the extended search to only those sub-partitions containing the searched terms, there is a significant reduction of sub-partitions that need to be looked at. This makes extended searches more efficient and faster.</p>
<p><b>Using sub-sub-partitions</b></p>
<p>The maps produced could even be based upon sub-sub-partitions of an extended document index partition, instead of sub-partitions.  There are fewer documents in the sub-sub-partitions.</p>
<p>A sub-partition might index the terms in approximately a half million documents, and those index sub-partitions are each partitioned into 128 sub-sub-partitions, which means that each sub-sub-partition will therefore index about 4,000 documents.</p>
<p><b>Returing results from the extended index</b></p>
<p>If results are found and sent to the extended mixer, snippets are requested from the documents associated with the query terms, and the results and snippets may be saved for future searches in the extended cache, and then sent to the standard mixer.</p>
<p>If no extended search results are found, the extended mixer informs the standard mixer of the lack of results.</p>
<p>If there are extended results, the mixer takes those (with snippets) and aggregates them, with standard search results from the cache or standard index server.  Aggregated search results are then sent to the query server, and then to the searcher.</p>
<p><b>An alternative method for performing an extended search</b></p>
<p>Similar to the above method, after results have been found in the extended database, the extended mixer determines how many extended search results there are.</p>
<p>The number of extended search results is sent to the mixer, which has already received standard search results and snippets.</p>
<p>The standard results and the number of extended results are sent to the query server, and those are shown to the searcher.</p>
<p>The standard search results and snippets are presented to the searcher, as well as a link stating that the number of extended results can be viewed by selecting the link. That link may be provided without showing the number of extended search results, or before the extended results have even been obtained.</p>
<p>If the searcher selects the link, the search is repeated, but with the extended search results shown to the searcher, providing the user with the standard results, and with results from more uncommon or obscure documents.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/supplemental-results-and-googles-extended-databases-11897/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cheat Sheet For Google Cheat Sheets</title>
		<link>http://searchengineland.com/cheat-sheet-for-google-cheat-sheets-11873</link>
		<comments>http://searchengineland.com/cheat-sheet-for-google-cheat-sheets-11873#comments</comments>
		<pubDate>Mon, 06 Aug 2007 14:43:00 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: Web Search]]></category>
		<category><![CDATA[Search Resources]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/cheat-sheet-for-google-cheat-sheets-11873.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Lifehacker <a href="http://lifehacker.com/software/cheat-sheets/-286178.php">points</a> to a two-page <a href="http://www.adelaider.com/google-cheat-sheet/">Google Cheat Sheet</a> PDF document that contains information on the company, their services, the IP addresses for Googlebot, Google domains, famous operators and much more.</p>
<p>But aren&#8217;t there more Google cheat sheets? Sure &#8212; here are some others:</p>
<p><span id="more-11873"></span></p>
<ul>
<li><a href="http://www.google.com/help/cheatsheet.html">Google&#8217;s Cheat Sheet</a>: From Google, with mainly searching tips</li>
<li><a href="http://www.googleguide.com/advanced_operators_reference.html">Google Guide Cheat Cheat</a>: From GoogleGuide, also tips on searching</li>
<li><a href="http://websearch.about.com/library/cheatsheet/blgooglecheatsheet.htm">About.com&#8217;s Google Cheat Sheet</a>: From About.com, again at-a-glance searching tips</li>
<li><a href="http://www.joostdevalk.nl/google-search-url-parameters-cheat-sheet/">Google Search URL parameters cheat sheet</a>: From Joost De Valk, a quick look at URL parameters</li>
<li><a href="http://www.google.com/search?q=google%20cheat%20sheet">and more</a></li>
</ul>
<p>The cheat sheet that Lifehacker mentions seems to be from November 2006, but we thought it would be helpful for some to mention here anyway.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/cheat-sheet-for-google-cheat-sheets-11873/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Free Fridays Now Officially Become Google Free Mondays</title>
		<link>http://searchengineland.com/google-free-fridays-now-officially-become-google-free-mondays-11742</link>
		<comments>http://searchengineland.com/google-free-fridays-now-officially-become-google-free-mondays-11742#comments</comments>
		<pubDate>Thu, 19 Jul 2007 21:15:44 +0000</pubDate>
		<dc:creator>Danny Sullivan</dc:creator>
				<category><![CDATA[Search Resources]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/google-free-fridays-now-officially-become-google-free-mondays-11742.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Earlier this month, we started the
<a href="http://searchengineland.com/070620-145718.php">Google Free Friday
series</a>, designed to open readers&#8217; eyes to search engines beyond Google.
<a href="http://searchengineland.com/070705-194910.php">AOL happened</a> on
schedule, then I had to <a href="http://searchengineland.com/070713-105947.php">
move</a> Ask.com to <a href="http://searchengineland.com/070716-000001.php">this
past Monday</a>, as I got behind in writing as we launched our
<a href="http://sphinn.com/">Sphinn</a> social news and sharing site.
Microsoft&#8217;s day was supposed to be tomorrow, but because
<a href="http://daggle.com/070718-033426.html">I&#8217;m on semi-vacation</a> this
week, I haven&#8217;t been able to finish the prep guide to Microsoft&#8217;s search
services. So, Microsoft&#8217;s day will now be Monday, July 23. Since that will be
two Google Free &quot;Fridays&quot; happening on a Monday, I figure let&#8217;s get it over with
and move Yahoo as well. Yahoo&#8217;s day will be Monday, July 30.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-free-fridays-now-officially-become-google-free-mondays-11742/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Research Director Peter Norvig On &#8216;The Future Of Search&#8217;</title>
		<link>http://searchengineland.com/googles-research-director-peter-norvig-on-the-future-of-search-11717</link>
		<comments>http://searchengineland.com/googles-research-director-peter-norvig-on-the-future-of-search-11717#comments</comments>
		<pubDate>Tue, 17 Jul 2007 20:08:41 +0000</pubDate>
		<dc:creator>Greg Sterling</dc:creator>
				<category><![CDATA[Google: Labs]]></category>
		<category><![CDATA[Google: Mobile]]></category>
		<category><![CDATA[Google: User Interface]]></category>
		<category><![CDATA[Google: Voice Search]]></category>
		<category><![CDATA[Google: Web Search]]></category>
		<category><![CDATA[Search Resources]]></category>
		<category><![CDATA[Stats: Search Behavior]]></category>

		<guid isPermaLink="false">http://searchengineland.com/beta/googles-research-director-peter-norvig-on-the-future-of-search-11717.php</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>MIT&#8217;s Technology Review published <a href="http://www.technologyreview.com/Biztech/19050/?a=f">an interview with Google Director of Research Peter Norvig</a> that explores his (and presumably Google&#8217;s) thinking about problems in search and &#8220;next-generation&#8221; search functionality that Google is working on. There&#8217;s nothing strikingly new in the interview but it&#8217;s an interesting overview and window into some of the current projects.</p>
<p><span id="more-11717"></span>
Among them, Norvig emphasizes speech recognition and processing, both in mobile (i.e., Goog411) and hypothetically on the desktop. He also discusses getting users to provide more information (&#8220;natural language&#8221;) or interact more with search and help disambiguate queries to enable Google deliver more tailored results. Norvig also discusses trying to develop better understanding of the contents of documents (including video).</p>
<p>Here are some interesting excepts of Norvig&#8217;s responses:</p>
<p><b>Re projects with the most funding:</b> The two biggest projects are machine translation and the speech project. Translation and speech went all the way from one or two people working on them to, now, live systems . . . We wanted speech technology that could serve as an interface for phones and also index audio text. After looking at the existing technology, we decided to build our own. We thought that, having the data and computational resources that we do, we could help advance the field. Currently, we are up to state-of-the-art with what we built on our own, and we have the computational infrastructure to improve further. As we get more data from more interaction with users and from uploaded videos, our systems will improve because the data trains the algorithms over time.</p>
<p><b>Re the problems in search:</b> One is understanding users&#8217; needs more. The other is understanding the contents of documents, whether they be Web pages or video.</p>
<p><b>Re more user input/interaction with search:</b> One of the things we&#8217;re looking at is finding ways to get the user more involved, to have them tell us more of what they want. People type the query &#8220;map,&#8221; and then they get upset if it&#8217;s not the map they were thinking of. So, people may be willing to talk more than type. Or maybe they&#8217;re willing to take a suggestion if we offer something that they didn&#8217;t type a query for, but is related.</p>
<p><b>Re mobile search:</b> [T]here are search interactions other than main Web search. When you&#8217;re on cell phones, you can only see one link at a time. It really changes the game. There&#8217;s much more impetus for us to be correct, so we&#8217;re thinking about that kind of interaction there, and how you could use audio to present information.</p>
<p><b>Re natural language search:</b> I think there&#8217;s a whole range of what you can mean as natural-language search. The first part of that range, we&#8217;ve been doing for a while. For instance, we understand synonyms and that the two words in San Francisco should go together. But then there&#8217;s Las Vegas and Vegas, which mean the same thing, and New York and York don&#8217;t mean the same thing. Those are the kinds of things we figure out. Another component of natural-language search is to parse a longer query into components. And the farthest along is typing in a full sentence in English and getting a full sentence as an answer. That sort of thing we&#8217;re not doing yet. We are answering some kinds of questions. You can query &#8220;population of Japan,&#8221; and we&#8217;ll pull that out. But for the majority of questions, that&#8217;s not what people want. They don&#8217;t want the burden of having to express it as a full sentence.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/googles-research-director-peter-norvig-on-the-future-of-search-11717/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.380 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-02-09 23:46:25 -->
<!-- Compression = gzip -->
