<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>searchengineland.com &#187; Google: Blog Search</title>
	<atom:link href="http://searchengineland.com/library/google/google-blog-search/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: Must Read News About Search Marketing &#38; Search Engines</description>
	<lastBuildDate>Fri, 19 Mar 2010 23:46:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>US Dept. Of Justice: Court &#8220;Lacks Authority To Approve&#8221; Google Book Search Settlement</title>
		<link>http://searchengineland.com/us-court-lacks-authority-to-approve-google-book-search-amended-settlement-agreement-35204</link>
		<comments>http://searchengineland.com/us-court-lacks-authority-to-approve-google-book-search-amended-settlement-agreement-35204#comments</comments>
		<pubDate>Fri, 05 Feb 2010 01:22:31 +0000</pubDate>
		<dc:creator>Greg Sterling</dc:creator>
				<category><![CDATA[Google: Blog Search]]></category>
		<category><![CDATA[Google: Critics]]></category>
		<category><![CDATA[Google: Legal]]></category>
		<category><![CDATA[Legal: Copyright]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=35204</guid>
		<description><![CDATA[With a final &#8220;fairness hearing&#8221; scheduled for February 18, the US Department Of Justice has filed briefs (PDF) in opposition to the Google Book Search Amended Settlement Agreement (&#8221;ASA&#8221;). The DOJ commends the parties for their efforts to reach an amended agreement that addresses some of the problems with the original settlement but concludes that [...]]]></description>
			<content:encoded><![CDATA[<p>With a final &#8220;fairness hearing&#8221; scheduled for February 18, the US Department Of Justice has filed briefs (<a href="http://thepublicindex.org/docs/amended_settlement/usa.pdf">PDF</a>) in opposition to the Google Book Search <a href="http://searchengineland.com/revised-google-book-settlement-filed-29814">Amended Settlement Agreement </a>(&#8221;ASA&#8221;). The DOJ commends the parties for their efforts to reach an amended agreement that addresses some of the problems with the original settlement but concludes that there are still fundamental anti-trust issues with the ASA:</p>
<blockquote><p>Despite the commendable efforts of the parties to improve upon the initial Proposed Settlement, many of the problems previously identified with respect to the original settlement remain in the ASA [Amended Settlement Agreement].  The United States remains committed to working with the parties on the settlement’s scope and content.</p></blockquote>
<p>The Department Of Justice&#8217;s filings argue, effectively, that class action litigation is the wrong mechanism to address the myriad business and copyright questions at issue in the sweeping agreement. It also presents the related question of whether the court, in fact, has the actual authority to approve the ASA in the context of the specific federal statutes raised and adjudicated in the case.</p>
<p>Here&#8217;s a excerpt from the government&#8217;s papers, which captures the essence of the objections that are then laid out in a technical-legal way throughout the brief. Key parts have been bolded:</p>
<blockquote><p>Despite this substantial progress, substantial issues remain.  Although the United States believes the parties have approached this effort in good faith and the ASA is more circumscribed in its sweep than the original Proposed Settlement, <strong>the ASA suffers from the same core problem as the original agreement:  it is an attempt to use the class action mechanism to implement forward-looking business arrangements that go far beyond the dispute before the Court in this litigation.  As a consequence, the ASA purports to grant legal rights that are difficult to square with the core principle of the Copyright Act that copyright owners generally control whether and how to exploit their works during the term of copyright.  Those rights, in turn, confer significant and possibly anticompetitive advantages on a single entity – Google.</strong> Under the ASA as proposed, Google would remain the only competitor in the digital marketplace with the rights to distribute and otherwise exploit a vast array of works in multiple formats.  Google also would have the exclusive ability to exploit unclaimed works (including so-called “orphan works”1) without risk of liability.  The ASA’s pricing mechanisms, though in some respects much improved, also continue to raise antitrust concerns.</p></blockquote>
<p>We&#8217;ll provide more analysis and context later after we&#8217;ve had more time to digest the arguments and objections of the US government. Also see this press <a href="http://www.justice.gov/opa/pr/2010/February/10-opa-128.htmlhttp://www.justice.gov/opa/pr/2010/February/10-opa-128.html">release</a> from the US Department Of Justice and this <a href="http://hosted.ap.org/dynamic/stories/U/US_TEC_GOOGLE_BOOK_BATTLE?SITE=CAANR&amp;SECTION=HOME&amp;TEMPLATE=DEFAULT">AP article</a> on it. For further background on the case and the proposed amended settlement, also see:</p>
<ul>
<li><a href="../../revised-google-book-settlement-filed-29814">Revised Google Book Settlement Filed &amp; Live Blogging The Press Call</a></li>
<li><a href="../../googles-schmidt-to-book-settlement-critics-whats-your-solution-25950">Google’s Schmidt To Book Settlement Critics: What’s Your Solution?</a></li>
</ul>
<p><strong>Postscript From Danny Sullivan: </strong>I&#8217;m now reading through the filing and effectively doing a fast live blogging of it. Here are sections that stand out to me. I&#8217;ve bolded especially interesting parts. From Page 2:</p>
<blockquote><p>As a consequence, the <strong>ASA purports to grant legal rights that are difficult to square with the core principle of the Copyright Act that copyright owners generally control whether and how to exploit their works during the term of copyright.  Those rights, in turn, confer significant and possibly anticompetitive advantages on a single entity – Google.</strong> Under the ASA as proposed, Google would remain the only competitor in the digital marketplace with the rights to distribute and otherwise exploit a vast array of works in multiple formats.  Google also would have the exclusive ability to exploit unclaimed works (including so-called “orphan works”1) without risk of liability.</p></blockquote>
<p>Also from page 2:</p>
<blockquote><p>The United States recognizes that the parties to the ASA are seeking to use the class action mechanism to overcome legal and structural challenges to the emergence of a robust and diverse marketplace for digital books, including through<strong> the adoption of an “opt-out” system</strong> to address the complexity of managing millions of third-party copyrights.  <strong>Under existing law, copyrighted works typically cannot be exploited in all of the ways the ASA contemplates without the prior permission of the rightsholders.  But for many works, especially out-of-print works, rights clearance may not be possible as a practical matter.</strong> Even if the rightsholders can be located, it may not be clear (as between author and publisher, or as among many competing claims to rights in the work) who “owns” or controls the digital licensing of particular works.  This is especially likely where publication predated, and contracts did not anticipate, the digital era.  Finally, <strong>there are no major licensing systems in place</strong> by which good faith users can efficiently secure permission from, and render payment to, authors, publishers, and the other rightsholders implicated by the ASA.</p></blockquote>
<p>This is interesting because opt-out is exactly how web indexing works. Google and other major search engines do not gain the explicit permission of rights holders before making a copy of pages in order to index them for searching purposes. Here, the DOJ seems to suggest that opt-out isn&#8217;t good enough and also notes that there are no major licensing systems. In the web word, one could argue that the robots exclusion protocol effectively works as a licensing system. Even if an author isn&#8217;t explicitly know, Google can still automatically ask the hosting web server for permission.</p>
<p><a href="http://daggle.com/search-engines-permissions-moving-forward-in-copyright-battles-229">Search Engines, Permissions &amp; Moving Forward In Copyright Battles</a> is a primer I wrote that explains these issues for, especially comparing web search to book search. In inability for book authors to automatically opt-out is one reason why I suggested in it that Google not scan copyright books without explicit permission.</p>
<p>This is also a good time to note that many in-copyright books have been scanned with permission. And any that you can read via Google Book Search are there because authors or publishers did grant permission. There&#8217;s a great deal of confusion on this issue.</p>
<p>From page 3:</p>
<blockquote><p>These realities <strong>make it difficult if not impossible to engage in lawful large-scale book digitization projects</strong>, thereby denying the public the full corpus of 20th
century books and, perhaps, unknown benefits of future creativity and economic innovation.</p></blockquote>
<p>Is this the DOJ saying it doesn&#8217;t view the scanning Google has done to be lawful? Not quite, as I&#8217;ll get into further below.</p>
<p>Also from page 3:</p>
<blockquote><p>Despite this worthy goal, <strong>the United States has reluctantly concluded that use of the class action mechanism in the manner proposed by the ASA is a bridge too far</strong>.  The Supreme Court has cautioned that “Rule 23, which must be interpreted with fidelity to the Rules Enabling Act and <strong>applied with the interests of absent class members in close view, cannot carry the large load</strong>” of restructuring legal regimes in the absence of congressional action – however sensible that restructuring might be.  Amchem Prods., Inc. v. Windsor, 521 U.S. 591, 629 (1997).2  That caution should be heeded here.  Indeed, it applies with particular force because the legal and structural changes the parties seek to accomplish would confer on one entity a level of market dominance that other competitors without access to the ASA’s special rules and procedures will be hard pressed to challenge for the foreseeable future.  <strong>For these reasons, the ASA is not the appropriate way to achieve the laudable goals the parties seek.</strong></p></blockquote>
<p>As I read it, since the settlement involves books from many parties that can&#8217;t be found (&#8221;orphan works&#8221; that are in copyright but where it is difficult or impossible to currently find the rights holders), a class action suit can&#8217;t involve them.</p>
<p>From page 4:</p>
<blockquote><p>At this time, in the view of the United States, the <strong>public interest would best be served by direction from the Court encouraging the continuation of settlement discussions between the parties and, if the Court so chooses, guidance as to those aspects of the ASA that need to be addressed</strong>.  The United States is committed to working constructively with all stakeholders on the scope and content of an appropriate settlement of this matter, and on legislative or market-based solutions to ensure a robust marketplace for digital works.</p></blockquote>
<p>I&#8217;m at a loss as how this is supposed to magically happen. The DOJ has indicated that it feels many books were illegally scanned. At the same time, it finds the settlement an attempt to a &#8220;laudable goal&#8221; of getting around that problem. But if you don&#8217;t have a class action suit &#8212; and you can&#8217;t find the rights-holders of some of those books &#8212; what then?</p>
<p>It may be that the two main parties who brought the suit against Google, the American Association of Publishers and the Authors Guild, will only be able to strike a settlement for books that have known rights-holders. As for orphan works, potentially those could be excluded &#8212; and Google could potentially continue to use them for searchable purposes unless the actual rights holders come forward. Or, potentially the US government might take action against Google over those works. We&#8217;ll see what shakes out as this filing is digested.</p>
<p>Indeed, later on page 4 is this:</p>
<blockquote><p><strong>The United States accepts the proposition that a properly defined and adequately represented class of copyright holders may be able to settle a lawsuit over past conduct by licensing a somewhat broader range of conduct.</strong></p></blockquote>
<p>That seems to say if all these unknown rights-holders and their works are excluded, then a class action settlement can go forward. You get more of this on page 5:</p>
<blockquote><p><strong>In previous submissions to this Court, the United States </strong>(and other interested parties) <strong>discussed the Rule 23 limitations</strong> expressed in Amchem, 521 U.S. at 620, 628-29, see, e.g., U.S. SOI at 6-8 (D.E. 720), <strong>which suggests that absent class members cannot be adequately represented</strong> as to uncertain injuries or rights that are far removed from the facts underlying the complaint.</p></blockquote>
<p>Many critics of both the original and amended settlement have focused (quite rightly, I&#8217;d say) on the fact that it doesn&#8217;t actually settle what the case was all about originally &#8212; whether scanning books and showing short portions is fair use or not. Instead, the settlement created a new business arrangement that conferred those rights without exploring the legality. On page 6, the DOJ seems to suggest that the court should only rule on the scanning and short display portion &#8212; and that on this issue, it could indeed rule to cover those &#8220;absent&#8221; or &#8220;orphan&#8221; rights-holders:</p>
<blockquote><p><strong>The provisions that settle the specific allegations of infringement in the Class Complaint – those that allow Google to scan millions of copyrighted works and to make available small portions of such works in response to search requests – address disputes within the Court’s subject matter jurisdiction</strong>.  Those aspects of the ASA are based on specific conduct that falls squarely within the scope of the case made through the pleadings. <strong>There are strong arguments that an appropriate set of publisher and author class representatives can adequately represent all members of the class</strong> with respect to reaching a settlement as to the uses of their works challenged in the litigation.</p></blockquote>
<p>In contrast, provisions of the agreement that would allow Google to actually sell full copies of works online don&#8217;t seem to make sense, since Google didn&#8217;t do anything like this in the first place. It wasn&#8217;t sued over this:</p>
<blockquote><p>The broader aspects of the ASA stand on a somewhat different footing.  <strong>There has not been – and simply could not be – any allegation in this litigation that Google has sold full access to works for which it lacks the right to do so</strong>, or even that such activity was threatened.  <strong>Indeed, selling such access would have been legally indefensible</strong> , and thus would have been at odds with Google’s entire pre-settlement book search strategy, which was premised upon staying within colorable “fair use” grounds.  With very good reason, therefore, Google consciously avoided creating precisely the factual predicate that might support the settlement of book- and subscription-selling claims.  <strong>The business models that the ASA authorizes therefore relate to activities in which Google never engaged or threatened to engage, and thus claims of copyright infringement that could not have been brought.</strong> Although Rule 23 does not require the Court to survey the claims of every class member to determine if they are ripe, t<strong>here are serious questions about whether a settlement that resolves future claims by absent class members</strong> for activities well beyond the facts underlying the complaint can meet the first prong of the Firefighters test.</p></blockquote>
<p>Remember earlier when I asked if the DOJ was saying that scanning to make something searchable was a copyright violation? This tells me no. This section seems to suggest that the real red flag would have been if Google had reprinted book for sale. THAT would have been legally indefensible.</p>
<p>From page 8:</p>
<blockquote><p>Here, in contrast, <strong>the ASA authorizes future activities beyond the scope of the conduct alleged in the complaint that do not remedy injuries plaintiffs suffered in the past, nor do they seek to prevent future injuries</strong>.  Rather, <strong>these provisions provide the defendant with benefits it could not have secured either through trial or even through normal private negotiations</strong>.</p></blockquote>
<p>Again, more of the &#8220;this settlement goes to far&#8221; theme.</p>
<p>From page 9:</p>
<blockquote><p><strong>The ASA seeks to carve out an exception from the Act’s normal rules and presumptions, which require a rightsholder to affirmatively grant permission for the kinds of uses contemplated by the ASA.  The parties claim that creating an opt-out exception would better serve the purposes of the Constitution’s Copyright Clause</strong> by promoting the progress of science and the useful arts.  That, however, is a judgment better suited for legislative consideration, rather than one for courts to make in the context of approving a settlement under Rule 23.</p></blockquote>
<p>Here, the opt-out discussion is about how the agreement would allow books to be shown or sold unless the authors opted-out (unlike the opt-out of just scanning, that I discussed before). Many critics have wondered why the agreement doesn&#8217;t go the other way &#8212; allow Google to show or sell only books with explicit permission. All the parties to the agreement have countered with so many orphan works, it&#8217;s easier to go opt-out. Then as a new registry is created to hunt down orphan rights-holders, they can choose to opt-out. Certainly switching to opt-in would have made getting some agreement in place much easier.</p>
<p>From page 10, although the DOJ recognizes that opt-out would benefit the public more, that can&#8217;t come at the expense of rights-holders:</p>
<blockquote><p><strong>The United States recognizes that it is the ASA’s broad grant of rights</strong> to Google, coupled with the settlement’s opt-out requirements, that <strong>allows for the use of the largest possible universe of digital works</strong>.  <strong>The United States also recognizes that</strong>, although Google’s activities are commercially motivated, <strong>its business plan would generate numerous public benefits.  The ASA would achieve these benefits, however, in spite of and not in furtherance of the basic premises of the Copyright Act</strong>.</p></blockquote>
<p>From page 12, the DOJ raises the issue that the court seeks to impose a settlement on works on non-US authors who may not be fully represented:</p>
<blockquote><p>Nonetheless, <strong>there are significant numbers of foreign authors from outside Canada, the UK, and Australia whose works were published in one of those countries or registered in the United States, and thus are subject to the ASA, even though the rightsholders may not have been represented by the new associational plaintiffs</strong>.  This point is made clear by foreign governments, which object to the settlement.</p></blockquote>
<p>Further pages get into pricing issues that frankly get beyond me. Suffice to say, the DOJ is worried there are anti-competitive issues involved.</p>
<p>Page 21 comes back to the issue that the settlement grants rights to Google that it wasn&#8217;t originally sued over, and which competitors would be hard-pressed to gain:</p>
<blockquote><p><strong>There is no serious contention that Google’s competitors are likely to obtain comparable rights independently.</strong> For example, Amazon – Google’s likely chief rival digital book distributor were the ASA to be approved – began scanning copyright-protected books in 2002, after firstsecuring permission of the works’ rightsholder(s).  To date, Amazon has amassed a library of approximately three million digital titles.  See Amazon.com, Inc. Obj. at 1 (D.E. 206).  This impressive number pales in comparison to the tens of millions of books Google has scanned or is poised to scan if the ASA is approved.  <strong>The suggestion that a competitor should follow Google’s lead by copying books en masse without permission in the hope of prompting a class action suit to be settled on terms comparable to the ASA is poor public policy and not something the antitrust laws require a competitor to do.</strong></p></blockquote>
<p>I love this part. It&#8217;s so blunt and straight-forward. Others could scan books just like Google did and show short snippets. If they did, maybe they&#8217;d get sued. But even then, there&#8217;s no guarantee a settlement would allow them to sell those books in the way Google will be allowed.</p>
<p>On page 22, the DOJ goes to Google&#8217;s core search business:</p>
<blockquote><p>Finally, wholly apart from the new business ventures contemplated by the ASA, <strong>Google’s exclusive access to millions and millions of books may well benefit Google’s existing online search business.  Google already holds a relatively dominant market share in that market. That dominance may be further entrenched by its exclusive access to content through the ASA</strong>.  <strong>Content that can be discovered by only one search engine offers that search engine at least some protection from competition.  This outcome has not been achieved by a technological advance in search or by operation of normal market forces; rather, it is the direct product of scanning millions of books without the copyright holders’ consent </strong>and then using Rule 23 to achieve results not otherwise obtainable in the market.</p></blockquote>
<p>Here, the DOJ&#8217;s argument is weaker, I&#8217;d say. Amazon or Microsoft could scan books just as Google did to make them searchable. They could have been doing that for years. They might not have because they interpreted fair use differently. Or, in the case of Microsoft, simply because they didn&#8217;t think book search was commercially attractive enough, as it said when it backed out of scanning in 2008 (see <a href="../../microsoft-burns-book-search-lacks-high-consumer-intent-14066">Microsoft Burns Book Search – Lacks “High Commercial Intent”</a>). The agreement wouldn&#8217;t prevent any competitor from scanning for searching purposes.</p>
<p>However, the agreement &#8212; in allowing Google to display books scanned &#8212; certainly would give it a more compelling book search service. And as previously covered, it would be much harder to nearly impossible for competitors to get those display rights.</p>
<p>On page 23, some DOJ suggestions:</p>
<blockquote><p><strong>The United States continues to believe that an approvable settlement may be achievable here, for example, by requiring rightsholders to “opt-in” to the settlement or by narrowing both the scope of the plaintiff class and the relief, to better align with the actual dispute underlying the case</strong>&#8230;.</p></blockquote>
<p>To bullet-point:</p>
<ul>
<li>Change from opt-out to opt-in</li>
<li>Narrow rights granted</li>
<li>Stick closer to what the case was about: can you scan for searching purposes</li>
</ul>
<p>Also on that page, there&#8217;s a suggestion of a waiting period so as to reduce the number of works that might get used without explicit permission:</p>
<blockquote><p>The United States believes there would be real value in <strong>creating a meaningful waiting period before Google may commercially exploit out-of-print works without the permission of the rightsholder </strong>(e.g., two years from the time the title is publicly listed in the Registry).  Such a waiting period, combined with efforts of the Registry to locate rightsholders, <strong>may reduce the number of rightsholders whose works would be exploited without their knowledge</strong></p></blockquote>
<p>Further points also look to narrow the use of orphan works. And that&#8217;s it.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/us-court-lacks-authority-to-approve-google-book-search-amended-settlement-agreement-35204/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Blog Search Sees Twitter Trends &amp; Raises With Blog Search &#8216;Hot Queries&#8217;</title>
		<link>http://searchengineland.com/google-blog-search-sees-twitter-trends-raises-with-blog-search-hot-queries-21940</link>
		<comments>http://searchengineland.com/google-blog-search-sees-twitter-trends-raises-with-blog-search-hot-queries-21940#comments</comments>
		<pubDate>Thu, 02 Jul 2009 19:52:46 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Google: Blog Search]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=21940</guid>
		<description><![CDATA[A lot of the innovation in search these days is all about What&#8217;s happening right now? Twitter is the poster child where real-time search is concerned, and despite its ongoing spam problem, Twitter Trends has become one of the primary places to get a snapshot of what&#8217;s happening now.
Google Blog Search is getting in the [...]]]></description>
			<content:encoded><![CDATA[<p>A lot of the innovation in search these days is all about <em>What&#8217;s happening right now?</em> Twitter is the <a href="http://searchengineland.com/twitter-embraces-its-inner-search-engine-17187">poster child</a> where real-time search is concerned, and despite its ongoing <a href="http://searchengineland.com/twitters-real-time-spam-problem-20614">spam problem</a>, Twitter Trends has become one of the primary places to get a snapshot of what&#8217;s happening now.</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2009/07/hot.png" alt="hot queries" width="220" height="235" class="alignleft" />Google Blog Search is getting in the game with its <a href="http://googleblog.blogspot.com/2009/07/new-blog-search-tools-feeds-hot-queries.html">announcement today</a> of &#8220;Hot Queries,&#8221; a new addition that appears in the upper right corner of the Google Blog Search home page. It shows 10 queries that are &#8220;happening now&#8221; &#8212; or, &#8220;currently popular&#8221; in Google&#8217;s words &#8212; in Blog Search. What we don&#8217;t know is if the list is ranked in actual order, or randomized; and we don&#8217;t know how often the Hot Queries list is refreshed. </p>
<p>That&#8217;s not the only new addition to the Google Blog Search home page; right below Hot Queries is a section called &#8220;Latest Posts,&#8221; which shows (and links to) 10 recent posts from &#8220;popular blogs.&#8221; Google hasn&#8217;t given any indication how it chooses which blogs to feature in this space.</p>
<p>Perhaps the most utilitarian new addition to Google Blog Search is RSS and Atom feeds that let users subscribe to any topic or story. Links for those are in the left column. </p>
<p>Lastly, there&#8217;s also an iGoogle Gadget that lets users embed the Google Blog Search home page on their iGoogle page.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-blog-search-sees-twitter-trends-raises-with-blog-search-hot-queries-21940/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quietly, Google Updates Its Blog Search Algorithm</title>
		<link>http://searchengineland.com/quietly-google-updates-its-blog-search-algorithm-19329</link>
		<comments>http://searchengineland.com/quietly-google-updates-its-blog-search-algorithm-19329#comments</comments>
		<pubDate>Fri, 15 May 2009 06:33:22 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Features: General]]></category>
		<category><![CDATA[Google: Blog Search]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=19329</guid>
		<description><![CDATA[There&#8217;s no official announcement (yet), but Google tells Search Engine Land that it&#8217;s made several improvements under the hood of Google Blog Search.
In an email conversation, Google&#8217;s Jeremy Hylton &#8212; head of the search quality group in Google&#8217;s New York office &#8212; explained how the changes impact the &#8220;clusters&#8221; of stories that appear on the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://searchengineland.com/figz/wp-content/seloads/2009/05/logo.png" alt="Google Blog Search logo" width="158" height="67" />There&#8217;s no official announcement (yet), but Google tells Search Engine Land that it&#8217;s made several improvements under the hood of <a href="http://blogsearch.google.com/">Google Blog Search</a>.</p>
<p>In an email conversation, Google&#8217;s Jeremy Hylton &#8212; head of the search quality group in Google&#8217;s New York office &#8212; explained how the changes impact the &#8220;clusters&#8221; of stories that appear on the blog search home page.</p>
<p>&#8220;We&#8217;re doing a better job of choosing the blog posts to include in clusters,&#8221; Hylton says. &#8220;We&#8217;re also working on changes to expand the number of posts we consider for clustering.&#8221;</p>
<p>One of the algorithmic changes is aimed at making sure the home page clusters reward the freshest and most authoritative blog content. Within any cluster, Google wants to find the posts that people are talking about the most.</p>
<p>&#8220;We have a lot of ranking signals.  We&#8217;ve been tuning the ranking and clustering algorithms to make better use of those signals.  One important changes is that we&#8217;re processing new links much faster, so the post that breaks a story and gets a lot of links is more likely to become the lead story.&#8221;</p>
<p>The update also includes some minor design changes. Gone are the blue boxes to the left of each cluster, <a href="http://www.flickr.com/photos/searchengineland/2908163672/sizes/o/">as shown here</a>. The new layout seems more spartan (&#8221;streamlined,&#8221; in Hylton&#8217;s words):</p>
<p><a title="Google Blog Search Home Page by Search Engine Land, on Flickr" href="http://www.flickr.com/photos/23148333@N06/3532295863/"><img src="http://farm3.static.flickr.com/2478/3532295863_7b4480ea94.jpg" alt="Google Blog Search Home Page" width="500" height="366" /></a></p>
<p>But Google&#8217;s changes are more about what powers Blog Search, not what it looks like.</p>
<p>&#8220;With [Thursday's] update,&#8221; Hylton says, &#8220;the whole collection of ranking changes provides a more authoritative set of results. I think it&#8217;s really a great set of bloggers talking about current events.&#8221;</p>
<p>Hylton says they&#8217;ll continue tweaking the algorithm and interface moving forward to improve the overall user experience.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/quietly-google-updates-its-blog-search-algorithm-19329/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Google To &#8216;Integrate&#8217; Microblogging?</title>
		<link>http://searchengineland.com/google-to-integrate-microblogging-18902</link>
		<comments>http://searchengineland.com/google-to-integrate-microblogging-18902#comments</comments>
		<pubDate>Fri, 08 May 2009 19:52:13 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Google: Blog Search]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=18902</guid>
		<description><![CDATA[While Twitter is looking at ways to be more search-ey, Google is apparently thinking about ways to be more microbloggy. Buried in a Reuters article about Google&#8217;s shareholders meeting yesterday is this paragraph about the company&#8217;s interest in the social web:
&#8220;Company executives who appeared alongside Schmidt at the media briefing said Google was looking at [...]]]></description>
			<content:encoded><![CDATA[<p>While Twitter is looking at ways to <a href="http://searchengineland.com/twitter-search-to-crawl-links-add-ranking-algorithm-18781">be more search-ey</a>, Google is apparently thinking about ways to be more microbloggy. Buried in a <a href="http://www.reuters.com/article/internetNews/idUSTRE54700220090508">Reuters article</a> about Google&#8217;s shareholders meeting yesterday is this paragraph about the company&#8217;s interest in the social web:</p>
<blockquote><p>&#8220;Company executives who appeared alongside Schmidt at the media briefing said Google was looking at ways to generate money from the surge of social networking activity on the Internet, <strong>as well as at ways of integrating microblogging capabilities, such as those popularized by Twitter, into its search product</strong>.&#8221;</p></blockquote>
<p><em>(emphasis is mine)</em></p>
<p>It&#8217;s not in quotes, so we&#8217;re left wondering what the exact words and context were &#8230; not to mention how Google might &#8220;integrate microblogging capabilities&#8221; into search. That seems to suggest something more than just <a href="http://searchengineland.com/google-yahoo-twitter-search-16193">offering &#8220;Twitter search&#8221;</a> or improving its real-time search capabilities. </p>
<p>On ReadWriteWeb, Marshall Kirkpatrick <a href="http://www.readwriteweb.com/archives/google_ceo_says_microblogging_coming_to_google_sea.php">speculates</a> three possibilities: adding microblogs as a search option (like images, video, etc.), using microblogging links to supplement the freshness factor of Google&#8217;s algorithm (which I suspect is already happening to some degree), or adding a status update box to its search interface, perhaps encouraging users to answer the question, &#8220;What are you searching for?&#8221;</p>
<p>Marshall thinks that last option is most likely, but I&#8217;m thinking a version of the first idea is most likely: not just adding a microblog search option, but adding real-time/microblog results as a onebox/universal search result. There&#8217;s already a <a href="http://searchengineland.com/new-tool-adds-twitter-search-to-google-16756">Greasemonkey script</a> that does this. But just as Google imports news articles, blog posts, videos, etc., for its blended search results, I could see microblog updates being pulled into Google&#8217;s search results. This would make the most sense on hot topics, breaking news, and the more immediate types of queries/information where Google&#8217;s traditional results don&#8217;t always give the user the information s/he wants.</p>
<p>Then again, I never thought I&#8217;d see the day that Google would <a href="http://searchengineland.com/google-searchwiki-launches-15561">let users add or remove search results</a>, so take my guess with the proverbial grain of salt.</p>
<p><strong>Update:</strong> On the L.A. Times blog, David Sarno <a href="http://latimesblogs.latimes.com/technology/2009/05/what-marissa-mayer-said-about-google-and-twitter.html">shares the exact text</a> of what Marissa Mayer said yesterday about Google and microblogging:</p>
<blockquote><p>&#8220;What&#8217;s really happening in Twitter is that there are a lot of clues in it in terms of what&#8217;s happening that&#8217;s interesting overall. It&#8217;s similar to what we see in Google Trends, where people will often type what they&#8217;re interested in into the search box, and we can make some predictions off of that.</p>
<p>So we are interested in being able to offer, for example, micro-blogging and micro-messaging in our search. Particularly in Blog Search and possibly in Web Search, but we don&#8217;t have any particular plans to announce.&#8221;</p></blockquote>
<p>So, taking that at face value, it sounds like we shouldn&#8217;t expect Google to be adding any update-your-status boxes, but a more obvious integration of Twitter/microblogging content into search results is on the table.</p>
<p><strong>Postscript by Barry Schwartz:</strong> Google Operating System <a href="http://googlesystem.blogspot.com/2009/06/google-to-launch-microblogging-search.html">spotted</a> that in the Google in Your Language project, Google is asking translators to help translate the description of a possible Google Microblogging search engine.  The specific text Google is asking to be translated is:</p>
<blockquote><p>Recent updates about QUERY. This is the MicroBlogsearch Universal result group header text. A Microblog is a blog with very short entries. Twitter is the popular service associated with this format.</p></blockquote>
<p>Just some more evidence that Google is working towards a Google Micro Blog search engine.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-to-integrate-microblogging-18902/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Fixes Link Operator In Blog Search</title>
		<link>http://searchengineland.com/google-fixes-link-operator-in-blog-search-17480</link>
		<comments>http://searchengineland.com/google-fixes-link-operator-in-blog-search-17480#comments</comments>
		<pubDate>Tue, 14 Apr 2009 12:54:43 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: Blog Search]]></category>
		<category><![CDATA[Google: Web Search]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=17480</guid>
		<description><![CDATA[A couple weeks ago, we reported that Google fixed the blogroll indexing algorithm in Google Blog Search, but has not yet fixed the link operator, which returned links found in blogrolls.  Google has recently updated the link operator for Blog Search to not count links in the blogroll for the link operator command in [...]]]></description>
			<content:encoded><![CDATA[<p>A couple weeks ago, we reported that Google fixed the <a href="http://searchengineland.com/google-blog-search-fixing-blogroll-indexing-bug-17088">blogroll indexing algorithm</a> in Google Blog Search, but has not yet fixed the link operator, which returned links found in blogrolls.  Google has recently <a href="http://www.seroundtable.com/archives/019828.html">updated the link operator</a> for Blog Search to not count links in the blogroll for the link operator command in blog search.</p>
<p>As I <a href="http://www.seroundtable.com/archives/019828.html">reported</a> at the Search Engine Roundtable this morning, Jeremy Hylton of Google said Google Blog Search &#8220;now drops most or all of the links that occur in the blogroll or in other parts of the page that are just boilerplate.&#8221;  </p>
<p>This issue dates back to <a href="http://www.seroundtable.com/archives/018624.html">early November</a> when Google Blog Search began <a href="http://searchengineland.com/google-blog-search-now-with-full-text-post-indexing-15722">indexing full text</a> of the blog post, including, in some cases, the blogroll and boilerplate portions of the site.  Over the course of now and then, Google has been working on tweaking these detectors to unique content and pushed out a major update a <a href="http://searchengineland.com/google-blog-search-fixing-blogroll-indexing-bug-17088">couple weeks ago</a> and now has updated the link operator (i.e. link:www.example.com).</p>
<p>My early tests show very positive results, but if you find any issues, please let Jeremy know in <a href="http://groups.google.com/group/google-blog-search/browse_thread/thread/790b97e23bdd3509/">this Google Groups thread</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-fixes-link-operator-in-blog-search-17480/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google Blog Search Fixing Blogroll Indexing Bug</title>
		<link>http://searchengineland.com/google-blog-search-fixing-blogroll-indexing-bug-17088</link>
		<comments>http://searchengineland.com/google-blog-search-fixing-blogroll-indexing-bug-17088#comments</comments>
		<pubDate>Fri, 27 Mar 2009 17:28:02 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Google: Blog Search]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=17088</guid>
		<description><![CDATA[As Barry Schwartz points out today on Search Engine Roundtable, the Google Blog Search team is rolling out fixes to how it separates blogrolls from actual blog posts.
Back in December, we wrote about Google&#8217;s switch to full-text indexing for Blog Search, but that led to problems identifying blogroll links as part of blog posts. 
Barry [...]]]></description>
			<content:encoded><![CDATA[<p>As Barry Schwartz <a href="http://www.seroundtable.com/archives/019717.html">points out</a> today on Search Engine Roundtable, the Google Blog Search team is rolling out fixes to how it separates blogrolls from actual blog posts.</p>
<p>Back in December, we <a href="http://searchengineland.com/google-blog-search-now-with-full-text-post-indexing-15722">wrote about</a> Google&#8217;s switch to full-text indexing for Blog Search, but that led to <a href="http://searchengineland.com/google-still-working-on-making-blog-search-more-relevant-16721">problems identifying blogroll links</a> as part of blog posts. </p>
<p>Barry points to a <a href="http://groups.google.com/group/google-blog-search/browse_thread/thread/75267bf8c4766b0c/cad477afcad0fd44#cad477afcad0fd44">Google Groups</a> discussion, where Google&#8217;s Jeremy Hylton says: </p>
<blockquote><p>&#8220;We have launched a ranking change that reduces the number of results that are returned because of blogroll matches. There are still problems to work out, but this change appears to be a big improvement over our earlier fix.</p></blockquote>
<p>So, it sounds like the problem isn&#8217;t fully solved yet, but they&#8217;re making progress.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-blog-search-fixing-blogroll-indexing-bug-17088/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter&#8217;s Traffic Growth &amp; The Rise Of Social Search</title>
		<link>http://searchengineland.com/twitter-traffic-rise-of-social-search-16910</link>
		<comments>http://searchengineland.com/twitter-traffic-rise-of-social-search-16910#comments</comments>
		<pubDate>Thu, 12 Mar 2009 21:32:46 +0000</pubDate>
		<dc:creator>Matt McGee</dc:creator>
				<category><![CDATA[Features: Analysis]]></category>
		<category><![CDATA[Google: Blog Search]]></category>
		<category><![CDATA[Search Engines: Blog Search Engines]]></category>
		<category><![CDATA[Stats: Hitwise]]></category>
		<category><![CDATA[Stats: comScore]]></category>
		<category><![CDATA[Top News]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=16910</guid>
		<description><![CDATA[There&#8217;s a lot of stats and analysis today about Twitter&#8217;s traffic, how it compares to other social search sites, and who&#8217;s benefiting from all the traffic Twitter can send. It seems no one argues that Twitter is on the rise, but just how high it&#8217;s risen is still up for debate.
Let&#8217;s start, appropriately, with a [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a lot of stats and analysis today about Twitter&#8217;s traffic, how it compares to other social search sites, and who&#8217;s benefiting from all the traffic Twitter can send. It seems no one argues that Twitter is on the rise, but just how high it&#8217;s risen is still up for debate.</p>
<p>Let&#8217;s start, appropriately, with a tweet: comScore <a href="http://twitter.com/comScore/statuses/1316941673">posted</a> a note that Twitter&#8217;s February traffic was up 55% over January. (We don&#8217;t see a <a href="http://www.comscore.com/press/pr.asp">formal release</a> of this data yet, but will update when we do.) 55% is nothing to shake a stick at; it&#8217;s serious growth.</p>
<p>Steve Rubel dug into Twitter Search traffic (i.e., to <a href="http://search.twitter.com/">search.twitter.com</a>) today, and <a href="http://www.micropersuasion.com/2009/03/twitter-search-to-eclipse-google-blog-search.html">suggests</a> that it&#8217;s about to surpass traffic to Google Blog Search:</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2009/03/3347367209_0903f5a3be.jpg" alt="compete screenshot" width="500" height="343" /></p>
<p>Steve&#8217;s numbers come from Compete.com, which show Twitter Search getting 1.35 million visitors per month, slightly less than Google Blog Search&#8217;s 1.38 million. But other data services don&#8217;t show the same thing. Here&#8217;s how Quantcast &#8212; a service that some believe is generally more accurate &#8212; compares Twitter Search and Google Blog Search:</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2009/03/quantcast.gif" alt="quantcast screenshot" width="500" height="330" /></p>
<p>There&#8217;s no way to know which is correct without seeing actual data from Twitter and Google (don&#8217;t hold your breath&#8230;), so it&#8217;s anyone&#8217;s guess if Twitter Search is close to surpassing Google Blog Search.  But from looking at the charts above, it doesn&#8217;t seem at all farfetched to think it will happen at some point in the future.</p>
<p>Meanwhile, Hitwise&#8217;s Heather Hopkins was also <a href="http://weblogs.hitwise.com/us-heather-hopkins/2009/03/where_to_from_twitter.html">writing about Twitter</a> today, but more specifically about where people go <em>from</em> Twitter.</p>
<p>Somewhat surprisingly (to me, at least) the Hitwise data shows Google as the largest recipient of Twitter traffic &#8230; and no, that doesn&#8217;t include traffic to YouTube (which I thought would be much higher than seventh):</p>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2009/03/twitter-downstream-websites.png" alt="hitwise chart" width="380" height="489" /></p>
<p>Another thing worth mentioning is Heather&#8217;s comparison of traffic categories. She says Twitter&#8217;s clickstream doesn&#8217;t send as much traffic to shopping and educational sites (like Wikipedia) as search engines do. It sends more traffic to other social network sites and to personal blogs/web sites than search engines do.</p>
<p>In that sense, Twitter itself behaves just like you&#8217;d expect it to &#8212; like a social network. But the growth of Twitter Search, as evidenced by the charts and data above, suggests that Twitter is moving into a different realm, somewhere between social networking and search, and perhaps bringing together the best of both worlds. Social search, as Steve Rubel says in his piece today, &#8220;adds a much needed layer of trust to traditional search that helps us qualify sources.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/twitter-traffic-rise-of-social-search-16910/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Google Spending Millions On Newspaper Ads To Notify Authors And Publishers Of Book Lawsuit Settlement</title>
		<link>http://searchengineland.com/google-spending-millions-on-newspaper-ads-to-notify-authors-and-publishers-of-lawsuit-settlement-16792</link>
		<comments>http://searchengineland.com/google-spending-millions-on-newspaper-ads-to-notify-authors-and-publishers-of-lawsuit-settlement-16792#comments</comments>
		<pubDate>Wed, 04 Mar 2009 18:12:30 +0000</pubDate>
		<dc:creator>Greg Sterling</dc:creator>
				<category><![CDATA[Google: Blog Search]]></category>
		<category><![CDATA[Google: Critics]]></category>
		<category><![CDATA[Google: Legal]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=16792</guid>
		<description><![CDATA[One of the requirements of class-action lawsuits is notification of potential class members. Most of the members of a class never know that a lawsuit has taken place on their behalf (e.g., bank X credit card holders). So the courts require attorneys to tell class members of the suit and settlement terms and allow them [...]]]></description>
			<content:encoded><![CDATA[<p>One of the requirements of class-action lawsuits is notification of potential class members. Most of the members of a class never know that a lawsuit has taken place on their behalf (e.g., bank X credit card holders). So the courts require attorneys to tell class members of the suit and settlement terms and allow them to opt out of the settlement if they desire. If they fail to opt out typically they&#8217;re bound by the terms and cannot sue independently.</p>
<p>Google is now in this position with the <a href="http://searchengineland.com/google-settles-copyright-litigation-for-125-million-paves-way-for-novel-services-15282">settlement</a> of the Google Book Search Copyright Class Action lawsuit against its book scanning project. As part of the settlement, Google has set up a $125 million fund to settle claims, which is discussed on <a href="http://www.googlebooksettlement.com/r/home">this dedicated website</a> constructed to report the settlement terms and related rules.</p>
<p>To satisfy the notification requirements, in addition to the settlement website, Google has used a variety of methods to communicate with potential claimants. According to an article in the <a href="http://www.nytimes.com/2009/03/04/books/04google.html?_r=1">NY Times</a>, it has used direct mail but it also plans to spend millions ($7 -$8 million) on newspaper and print magazine advertising to alert copyright holders about the settlement. The scope of all this is global &#8212; hence the price tag.</p>
<p>I would be remiss if I were not to point out the partial irony here: Google using traditional advertising to promote an online initiative. But the Internet, especially outside the West, still arguably doesn&#8217;t have the same reach as traditional media, which are thus critical to communicate with copyright holders, authors and publishers. (As an aside, if newspapers in certain US cities disappear it will be difficult to find a judicially acceptable alternative notification mechanism &#8212; perhaps a combination of TV, Internet and direct mail.)</p>
<p>The way the Book Settlement ad spend is apparently being distributed, according to the Times&#8217; article, is 30 percent US, 30 percent &#8220;industrialized countries&#8221; and 40 percent &#8220;rest of world.&#8221; This apparently reflects the regional contribution of books to the global library.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-spending-millions-on-newspaper-ads-to-notify-authors-and-publishers-of-lawsuit-settlement-16792/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google Still Working On Making Blog Search More Relevant</title>
		<link>http://searchengineland.com/google-still-working-on-making-blog-search-more-relevant-16721</link>
		<comments>http://searchengineland.com/google-still-working-on-making-blog-search-more-relevant-16721#comments</comments>
		<pubDate>Fri, 27 Feb 2009 14:20:40 +0000</pubDate>
		<dc:creator>Barry Schwartz</dc:creator>
				<category><![CDATA[Google: Blog Search]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=16721</guid>
		<description><![CDATA[It has been almost three months since Google Blog Search began indexing full text of the page and, since then, there have been numerous complaints from searchers and webmasters.  
The majority of the complaints are that Google Alerts are coming up for irrelevant or outdated blog results.  In addition, many blogs are no [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Google Blog Search by search-engine-land, on Flickr" href="http://www.flickr.com/photos/searchengineland/3082468615/"><img src="http://farm4.static.flickr.com/3225/3082468615_03a471189e_o.jpg" alt="Google Blog Search" align="left" hspace="5" width="162" height="75" /></a>It has been almost three months since <a href="http://searchengineland.com/google-blog-search-now-with-full-text-post-indexing-15722">Google Blog Search began indexing full text</a> of the page and, since then, there have been numerous <a href="http://groups.google.com/group/google-blog-search/browse_thread/thread/75267bf8c4766b0c">complaints</a> from searchers and webmasters.  </p>
<p>The majority of the complaints are that Google Alerts are coming up for irrelevant or outdated blog results.  In addition, many blogs are no longer indexed quickly anymore.  Plus, Google is indexing blogroll code and including them in the link command results.</p>
<p>Jeremy Hylton from the Google Blog Search team updated the <a href="http://groups.google.com/group/google-blog-search/browse_thread/thread/75267bf8c4766b0c/04b968572d6fef5a#04b968572d6fef5a">Google Groups thread</a> saying that Google will be conducting &#8220;visual experiments early next month&#8221; that will start with the link: queries and focus on &#8220;blogroll detectors&#8221; in the matching algorithm.  Personally, I am excited to see improvements with Google Blog Search.</p>
<p>Why do I care?  Well, I care for a few reasons.  As a searcher, I love being able to find relevant and fresh blogs with content on topics that I am looking for.  I know that I am missing key blogs in my discovery process and I can&#8217;t wait for the results to become fresher, quicker, and more relevant.  Yes, I know, easy for me to say, but hard for an engineer to make happen &#8211; good thing I am not a Google engineer. </p>
<p>In addition, I often track who is linking to the articles I write.  This way I can see what people are saying about my stories and help clarify my stories when necessary.  I often do that by watching Google Blog Search for queries on names such as Barry Schwartz, Search Engine Land, Search Engine Roundtable or RustyBrick.  I also use the link operator to find out who is linking to my stories on either Search Engine Land or the Search Engine Roundtable.  And ever since <a href="http://www.seroundtable.com/archives/018624.html">November 2008</a> the link operator in blog search failed for me.  Then it got worse when they <a href="http://www.ninebyblue.com/blog/google-blog-search-changes-how-it-indexes-posts/">tried fixing it</a> and got somewhat better with their <a href="http://www.seroundtable.com/archives/019040.html">second attempt</a> towards the end of December.  </p>
<p>I know many searchers, webmasters, and SEOs who would love to see future improvements.  So I hope this early March update will make for a better Google Blog Search, as Jeremy <A href="http://groups.google.com/group/google-blog-search/browse_thread/thread/75267bf8c4766b0c/04b968572d6fef5a#04b968572d6fef5a">said</a> it might.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-still-working-on-making-blog-search-more-relevant-16721/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Blog Search: Now With Full-Text Post Indexing</title>
		<link>http://searchengineland.com/google-blog-search-now-with-full-text-post-indexing-15722</link>
		<comments>http://searchengineland.com/google-blog-search-now-with-full-text-post-indexing-15722#comments</comments>
		<pubDate>Thu, 04 Dec 2008 21:27:32 +0000</pubDate>
		<dc:creator>Danny Sullivan</dc:creator>
				<category><![CDATA[Google: Blog Search]]></category>
		<category><![CDATA[How To: SEO]]></category>
		<category><![CDATA[Top News]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=15722</guid>
		<description><![CDATA[
It&#8217;s been about two months since Google Blog Search was relaunched with a new  front page that summarizes stories. I talked with Google more about some of  the inner workings at the end of October and finally am getting around to posting this, spurred by one of the planned changes becoming official. Google [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Google Blog Search by search-engine-land, on Flickr" href="http://www.flickr.com/photos/searchengineland/3082468615/"><img src="http://farm4.static.flickr.com/3225/3082468615_03a471189e_o.jpg" alt="Google Blog Search" width="162" height="75" /></a></p>
<p>It&#8217;s been about two months since Google Blog Search was relaunched with a <a href="../../google-blogs-other-front-pages-for-the-blogosphere-14912.php">new  front page that summarizes stories</a>. I talked with Google more about some of  the inner workings at the end of October and finally am getting around to posting this, spurred by one of the planned changes becoming official. Google Blog  Search now uses the full-text of posts (in most cases), rather than using  whatever was in a blog&#8217;s feed (which could often be only part of a post).<span id="more-15722"></span></p>
<p>It was always annoying that Google Blog Search only depended on what was put  out in a feed, rather than actually indexing the full-text of a blog post. Some  publishers don&#8217;t put out full-text feeds (like here on Search Engine Land) for a  variety of reasons, including the fact that putting out a full-text feed often  is misinterpreted by others as a right to reprint a post in its entirety without  getting formal permission.</p>
<p>Problem solved! Vanessa Fox&#8217;s recent <a href="http://www.ninebyblue.com/blog/google-blog-search-changes-how-it-indexes-posts/">Google  Blog Search Changes How It Indexes Posts</a> article covers how Google&#8217;s now  officially confirmed for the first time that blog posts are being spidered. From  what Google <a href="http://groups.google.com/group/google-blog-search/browse_thread/thread/8244fc8731f47970?pli=1">posted</a> to its support groups on the subject:</p>
<blockquote><p>We have changed the way we index blog posts to include the full content of  the page. We&#8217;ve had occasional complaints about the use of the feed content,  particularly the problem with partial feeds that you mentioned. The indexing  change has improved the results for a lot of queries, both because we have the  full content of the page and because we extract links that are missing from the  feeds.</p></blockquote>
<p>It&#8217;s a welcome change. When I talked to Google about it in October, they said  it was in the process of being rolled out slowly. Now it is fully live for all  new posts that get into Google Blog Search. However, some older posts may not be  fully indexed. Google expects that by early 2009, all pages indexed by Google  Blog Search <a href="http://www.google.com/support/faqs/bin/static.py?page=faq_blog_search.html&amp;hl=en#oldposts">since  June 2005</a> &#8212; which it said are in the billions &#8212; will be fully indexed.</p>
<p>And how do you get in Google Blog Search again? If you&#8217;re not already there,  the easiest way is to &#8220;ping&#8221; the service with your blog&#8217;s home page or your  blog&#8217;s feed. You can use this submission <a href="http://blogsearch.google.com/ping?hl=en">form</a>. Better, your blog should  automatically ping Google each time a post goes up. Most blogging software is  either already enabled to do this or it&#8217;s simple to add it. But Google provides  further instructions <a href="http://www.google.com/support/faqs/bin/static.py?page=faq_blog_search_pinging.html">here</a>.</p>
<p>In the past, pinging made Google simply grab the latest post as shown in your  feed, which is why those putting out only partial feeds did not have their posts  fully indexed. Now, a ping is supposed to cause Google to immediately grab the  full text of your post (I haven&#8217;t tested this yet, however). If it works as  advertised, that means within seconds, your full post should be indexed and  searchable within Google Blog Search.</p>
<p>A downside to full-text indexing is something that Barry Schwartz <a href="http://www.seroundtable.com/archives/018624.html">noted</a> earlier, as full-text indexing got rolled out unannounced over the past few weeks.  Blogs often have blogrolls, links to other blogs. Now that full-text indexing is  being done, links from these blogrolls caused some people to think there were  new blog posts being done about them. From what Google also posted in its groups  about the issue:</p>
<blockquote><p>The downside of this change is that we see more results that match only the  blogroll and other parts of the page that are common to all of a blog&#8217;s posts.</p>
<p>We expected some problems from blogroll matches, but may have underestimated  the impact on searches using the link: operator or where the query matches a  blog or blogger&#8217;s name. We do expect to fix the problem you&#8217;re seeing. We&#8217;ll use  the full page content, but exclude the content that isn&#8217;t really part of the  post. I&#8217;m not sure if we&#8217;ll be able to make the change before the end of the  year, but we are working on it and are pretty confident that it can be solved.  We&#8217;ll post an update here when we&#8217;ve got a solution.</p></blockquote>
<p>Google also just sent me this update:</p>
<blockquote><p>Yes, we&#8217;ve got a change that should help with the blogroll issue. Although,  it&#8217;s not 100% perfect, it performs quite well in our tests. We deployed it for  link: queries yesterday and hope to have it deployed for all queries in the next  few weeks.</p></blockquote>
<p>Meanwhile, what about how things are going with the new front page to Google  Blog Search? One issue that was plaguing Google Blog Search soon after launch  was that spam blogs seemed to be getting featured and sometimes embarrassing  play within the service.</p>
<p>Jeremy Hylton, a software engineer who works on the Google Blog Search  project, said that Google was aiming to get more reputable blogs featured, that  Google Blog Search has an internal ranking of blogs by quality and authority  that it could use. Looking today, things seem pretty clean &#8212; so I assume  they&#8217;ve notched up the standards.</p>
<p>A confusing thing to me after Google Blog Search launched was the order story &#8220;clusters&#8221; or &#8220;groups&#8221; are listed on the home page as well as the subject-specific  pages, such as <a href="http://blogsearch.google.com/blogsearch?hl=en&amp;topic=t">Technology</a>.  Each group has a box that shows you the number of blogs estimated to be discussing a topic and the time period where the discussion is stretching  over:</p>
<p><a title="Google Blog Search Cluster Box by search-engine-land, on Flickr" href="http://www.flickr.com/photos/searchengineland/3083333148/"><img src="http://farm4.static.flickr.com/3157/3083333148_3ea3b1e89b_o.jpg" alt="Google Blog Search Cluster Box" width="258" height="181" /></a></p>
<p>The screenshot above shows how a story about Walmart rumored to be selling iPhones has been mentioned on 41 blogs over 12 hours. Now consider what comes below it:</p>
<p><a title="Google Blog Search Topics by search-engine-land, on Flickr" href="http://www.flickr.com/photos/searchengineland/3082496671/"><img src="http://farm4.static.flickr.com/3174/3082496671_232b28f69e_o.jpg" alt="Google Blog Search Topics" width="480" height="343" /></a></p>
<p>See how under the iPhone story there&#8217;s a another story group about a Google Android phone being released in Korea. How come the iPhone story mentioned on 41 blogs over 12 hours trumps the Android story with 57 blogs mentioning it over 18 hours?</p>
<p>Hylton said that part of the ranking process is to look at the &#8220;burst of  activity&#8221; around a story. For example, a story with a sharp spike in mentions  that&#8217;s recent compared to other stories might come higher on the list, being  deemed newer news.</p>
<p>To see some of the spikes, check out any story&#8217;s &#8220;cluster&#8221; or &#8220;group&#8221; page. I keep using quotes because Google itself doesn&#8217;t have a name for these. But you get to them by clicking on the green link showing the number of blogs talking about a particular story. Here&#8217;s <a href="http://blogsearch.google.com/blogsearch/story?hl=en&amp;bcid=1240441582&amp;bc_lang=en">one</a> for the Walmart iPhone story that I mentioned.  On it, you can see a chart on the left showing the number of blogs covering the story and when those stories were spotted:</p>
<p><a title="Google Blog Search Activity Trend by search-engine-land, on Flickr" href="http://www.flickr.com/photos/searchengineland/3082496599/"><img src="http://farm4.static.flickr.com/3167/3082496599_0e70b38c60.jpg" alt="Google Blog Search Activity Trend" width="500" height="345" /></a></p>
<p>As a side note, while each story cluster/group has its own summary page, those URLs sadly  don&#8217;t seem to remain over time. I had examples from things we discussed back in  October <a href="http://blogsearch.google.com/blogsearch/story?hl=en&amp;bcid=1228540418&amp;bc_lang=en">like  this cluster</a> which now simply resolves to the Google Blog Search home page.  I wish the URLs would continue working permanently.</p>
<p>Also related, watch for those story cluster pages to be enhanced over time.  Hylton said that Google&#8217;s looking at ways they can build out more context about  a particular story.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/google-blog-search-now-with-full-text-post-indexing-15722/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
