<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Engine Land &#187; David Harry</title>
	<atom:link href="http://searchengineland.com/author/thegypsy/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: News On Search Engines, Search Engine Optimization (SEO) &#38; Search Engine Marketing (SEM)</description>
	<lastBuildDate>Tue, 21 May 2013 04:52:03 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Are Manual Solutions The Answer To Content Farms?</title>
		<link>http://searchengineland.com/are-manual-solutions-the-answer-to-content-farms-64134</link>
		<comments>http://searchengineland.com/are-manual-solutions-the-answer-to-content-farms-64134#comments</comments>
		<pubDate>Wed, 16 Feb 2011 17:13:15 +0000</pubDate>
		<dc:creator>David Harry</dc:creator>
				<category><![CDATA[Blekko]]></category>
		<category><![CDATA[Channel: SEO]]></category>
		<category><![CDATA[Content Farms]]></category>
		<category><![CDATA[Google: Web Search]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[content farms]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[search quality]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=64134</guid>
		<description><![CDATA[It was interesting to see some of the recent reactions when upstart Blekko decided to toss some sites out of their index. For the uninitiated it was a bit of a seeming PR play against Google whom have been getting smacked about for thin quality of late. If you hadn&#8217;t guessed by now, we&#8217;re talking [...]]]></description>
				<content:encoded><![CDATA[<p>It was interesting to see some of the recent reactions when upstart Blekko decided to <a href="http://searchengineland.com/blekko-bans-content-farms-from-their-index-63134" target="_blank">toss some sites out of their index</a>. For the uninitiated it was a bit of a seeming PR play against Google whom have been getting smacked about for thin quality of late. If you hadn&#8217;t guessed by now, we&#8217;re talking about (Demand Media&#8217;s) eHow and the other &#8220;<em>top 20 spam sites</em>&#8221; that were nuked.</p>
<p>Of course the question remains, why? It certainly does seem like a knee jerk reaction that almost panders to the search community. Sure, I dislike running into weak content in the SERPs as much as the next guy. But I am pretty sure that there are many other equally thin content in many cases, much worse than what they&#8217;re churning out. Seriously? There&#8217;s only 20 sites worth tossing?</p>
<h2>The State Of Modern Search</h2>
<p>All is not lost my friends. One of the better developments over the last few years is all of the new (potential) signals and and infrastructure to deal with them. To a certain extent there is every chance for Google (and other engines) to <a href="http://www.wordstream.com/blog/ws/2010/07/13/evolution-of-ranking-signals" target="_blank">get past the link</a>.</p>
<p>Why now, more so than in the past? The infrastructure (caffeine) and the motivation (growing quality grumblings). Let&#8217;s consider some areas that might make sense, while also helping to combat spam and low quality results.</p>
<blockquote><strong>Personalization.</strong> One of the longest running goals at Google is deeper <a href="http://searchengineland.com/library/google/google-personalized-search">personalized search</a>. Add to that the world of mobile, another personalization and area of great interest, we might see far more personalized results in the near future. If the new infrastructure enables a more granular personalization than is currently in place, this can give new signals that can lessen the spam we see on web today.</p>
<p><strong>Explicit User Feedback</strong>. When a user takes an action to tell the search engine something, it is a type of relevance feedback known as &#8216;explicit feedback&#8217;. Think of (now defunct) <a href="http://googleblog.blogspot.com/2008/11/searchwiki-make-search-your-own.html" target="_blank">search wiki</a> as a good example. Others might include emailing a page, saving to favorites and so forth. Traditionally, this type of data has been hard for search engines to come by. The noisier implicit feedback, is far more readily available. But, it would certainly be a great way to deal with spam or unwanted domains in a personalized search environment.</p>
<p><strong>Temporal Data.</strong> Another area that has been prevalent with Google over the last few years is freshness (in many query spaces). There may be some of<a href="http://www.huomah.com/Search-Engines/Search-Engine-Optimization/Link-Builders-Guide-to-Historical-Ranking-Factors.html" target="_blank"> these types of signals</a> that could increase relevancy while <a href="http://www.huomah.com/Search-Engines/Search-Engine-Optimization/Spam-detection-using-historical-factors.html" target="_blank">dealing with spam</a>. Even for the link graph, stronger weighting of these might help decrease the power of authority in many situations.</p>
<p><strong>Social Graph.</strong> Another obvious area is of course, <a href="http://searchengineland.com/what-social-signals-do-google-bing-really-count-55389">social.</a> The social graph and real-time search are two areas Google is also vested in over the last while. This can lead to deeper personalization as well as other potential signals. Once more, a lot of social signals are open to spam unless they&#8217;re used in a granular personalization approach. But in concert with the other elements mentioned here, it seems that it would also help root out weak content/sites while not opening the entire link graph up for manipulation.</blockquote>
<p>At the end of the day, there needs to be an automated solution that is protecting not only against spam but those that wish to do their competitors harm. Having &#8216;votes&#8217; of spam should only affect the individual user. You can&#8217;t spam yourself and removing a competitor <em>only</em> from your results, is the kind of personalization that would work to deal with this.</p>
<p><strong><a href="http://searchengineland.com/are-manual-solutions-the-answer-to-content-farms-64134/algo-solution-2-2" rel="attachment wp-att-64893"><img class="aligncenter size-large wp-image-64893" title="Algo-solution-2" src="http://searchengineland.com/figz/wp-content/seloads/2011/02/Algo-solution-21-500x227.jpg" alt="" width="500" height="227" /></a></strong></p>
<h2>Some Thoughts From The Geeks</h2>
<p>To try and gain some more insight into this and the larger considerations of user feedback, I contacted <a href="http://twitter.com/#!/skrenta" target="_blank">Rich Skrenta</a> from Blekko and <a href="http://twitter.com/#!/surfcanyon" target="_blank">Mark Cramer</a> of Surf Canyon (awesome tool, awesome geek).</p>
<p>On the topic of explicit feedback mechanisms such as we&#8217;ve seen with Google Search Wiki, Rich says they didn&#8217;t work because, <em>&#8220;there are too many possible queries, effectively an infinite set. How many different queries are there are all possible song lyrics?&#8221;.</em></p>
<p>Skrenta then made the case for their approach:</p>
<blockquote>&#8220;<em>What we are doing is identifying the top sites per category. The top 100 /health sites collectively have millions of pages and can answer any medical question you have.  The top 50 lyrics sites have lyrics for every song&#8221;.</em></blockquote>
<p>That makes some sense, but I am also leery of &#8216;human powered&#8217; solutions, which was countered by Rich whom contends that it&#8217;s:  &#8220;(..) <em>disingenuous to pretend that &#8220;the algorithm&#8221; drives the results. The algorithm gets changed on a day to day basis in response to new material appearing on the web.</em> &#8221;</p>
<p>Ok, yes, there are people constantly messing with the algorithms at Google, which does mean they&#8217;re also making subjective statements of their own. Also, for the unfamiliar, Google does have <a href="http://searchengineland.com/google-human-quality-reviews-old-news-returns-12977" target="_blank">raters in the system </a>scoring on perceived relevance as part of search quality testing.</p>
<p>Mark Cramer for his part as someone familiar with user feedback mechanisms, feels that, &#8220;<em>the implicit feedback approach is always the best. In most all cases, people are not interested in providing explicit feedback.</em>&#8221;</p>
<p>With Google SearchWiki being the glaring example. Surf Canyon did <a href="http://blog.surfcanyon.com/2011/02/07/my-lower-intestine-is-full-of-spam-egg-spam-bacon-spam-tomatoes-spam/" target="_blank">get involved</a> with the move by making it an option for their users, &#8220;<em>we figured it wouldn’t hurt to throw it in there</em>&#8221; said Cramer, referring to a <a href="http://www.surfcanyon.com/content_farm.jsp" target="_blank">new option for the application</a>.</p>
<p>In further clarification on Blekko&#8217;s approach, which is a more subjective stance, Skrenta once more uses the lyric SERP example:</p>
<blockquote>&#8220;<em>Rather than rolling the dice with a 200-weight algorithm that&#8217;s been trained by a bunch of minimum wage web contractors, you could actually just pick the top lyrics sites.  They have the lyrics to every song every published.  And they won&#8217;t download malware or spyware onto your computer.</em>&#8220;</blockquote>
<p>This is once more a seemingly logical approach, but I can&#8217;t see it being something a search engine such as Google would consider. It does speak more to a more personalized environment such as we looked at earlier.</p>
<p>As of this week, even Google is getting back into the explicit user feedback experience with a <a href="http://googleblog.blogspot.com/2011/02/new-chrome-extension-block-sites-from.html" target="_blank">Chrome add-on for removing sites from your results</a>. Will this fair any better than previous attempts? It is highly unlikely. Forgetting for a moment the market share for Chrome, users simply aren&#8217;t that interested. Just give them good results to start with.</p>
<h2>Dear Blekko</h2>
<p>While we can give kudos to the gang at Blekko for trying to say something on the need for higher quality search results, there are limits. This doesn&#8217;t scale well and would be a PR nightmare for any major search engine. Does e-How or Mahalo <em>really</em> have the worst result for <em>everything</em> it publishes? It seems a slippery slope to venture onto.</p>
<p>Where will it end and <em>what safeguards are in place?</em></p>
<p>Until it can be proven in some larger implementations that users will not only engage with explicit feedback but do it honestly, I don&#8217;t believe arbitrary, non-algorithmic, actions are the answer. It&#8217;s certainly not the answer for Google, I know that much.</p>
<p>One thing is certain; producing high quality relevant results ain&#8217;t easy.</p>
<h2>Finding An Algorithmic Solution</h2>
<p>So let us consider; what if the shoe was on the other foot?</p>
<p>Imagine that Google had made such a move. It most certainly wouldn&#8217;t be hailed; in fact, I am pretty sure people would be screaming from the mountain tops that <a href="http://www.seroundtable.com/google-search-bias-12830.html" target="_blank">Google was biased</a>, that they were the Internet judge and jury, on and on. I guarantee you that much.</p>
<p><img class="aligncenter size-large wp-image-65035" title="Algo-solution-1" src="http://searchengineland.com/figz/wp-content/seloads/2011/02/Algo-solution-11-500x227.jpg" alt="" width="500" height="227" /></p>
<p>This is one of the reasons that Google (and many other search engineers) tend to prefer to develop an algorithmic solution to the problem. One of the other obvious reasons is that constantly updating the index from a subjective strategy would be massively resource intensive and cause far more grief than poor results and <a href="../../mr-cutts-goes-to-washington-61234" target="_blank">search neutrality</a> have of late.</p>
<p>This approach is <em>not</em> the answer.</p>
<p>What needs to be done is to find better filters and dampeners which can help limit the positive effects on low quality results. Now this isn&#8217;t even close to being as easy as it sounds.</p>
<p>One element that is certainly a potential block is authority. Often, these types of sites have the link equity, age and trust that makes ranking for many a long tail terms, fairly easy. If you&#8217;ve ever worked on a strong (authority) domain, you know what I mean. But what if that dampening has an effect on the authority of <em>your</em> site?</p>
<p>See? Not so easy, is it? There will always be winners and losers when the goal posts are moved. You may be one of the losers. Be careful what you ask for.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/are-manual-solutions-the-answer-to-content-farms-64134/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>A Tactical Guide To Becoming An SEO Ubergeek</title>
		<link>http://searchengineland.com/a-tactical-guide-to-becoming-an-seo-ubergeek-52954</link>
		<comments>http://searchengineland.com/a-tactical-guide-to-becoming-an-seo-ubergeek-52954#comments</comments>
		<pubDate>Thu, 14 Oct 2010 15:52:07 +0000</pubDate>
		<dc:creator>David Harry</dc:creator>
				<category><![CDATA[Channel: SEO]]></category>
		<category><![CDATA[How To: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=52954</guid>
		<description><![CDATA[So you&#8217;re sitting there thinking: how can I take my SEO chops to the next level? Well, I am sure hoping you are, or have at some time along the path. But where to start? To me an SEO that doesn&#8217;t understand information retrieval (IR) is like the web developer that doesn&#8217;t know HTML. You [...]]]></description>
				<content:encoded><![CDATA[<p>So you&#8217;re sitting there thinking: how  can I take my SEO chops to the next level? Well, I am sure  hoping you are, or have at some time along the path. But where to start? To me an SEO that doesn&#8217;t understand information retrieval (IR) is like the web  developer that doesn&#8217;t know HTML. You really <i>should</i> know how a search  engine works. No, I am serious&#8230; it is 2/3rd of the  initialism for cryin&#8217; out loud.</p>
<p>You should be proud to say: <i>Hi there. My name is Dave and I am an  algoholic.</i> </p>
<p>For my first post here on Search Engine  Land I want to bring you into my world. A glimpse into what types  of articles I will be writing for you here. For those not familiar  with me I am afflicted with the IR bug in the most geeky of ways and  today, I&#8217;ll give you a crash course on how you can be too.</p>
<h2>Become Patently Obvious</h2>
<p>The first thing we are going to look at  are patents. Or at least we&#8217;ll get some perspective. You see, all too often the SEO world freaks out when a patent is awarded and starts hailing it as if someone discovered Atlantis. This is truly bad form.  Right away one needs to consider the concept of <em>patent pending</em>. If  the patent was filed in say, 2004, then it has already been implemented, certainly adapted, and even possibly discarded since then. Nothing from a patent  is <em>new</em> nor telling beyond gleaning the mindset of a search  engineer. One must avoid the <a href="http://www.huomah.com/Search-Engines/Algorithm-Matters/SEO-Magic-Bullet-2010-Edition.html" target="_blank">SEO Magic Bullet</a> approach.</p>
<p>People also tend to look at patents in isolation. This is also short sighted. Google has been awarded more  than 10 patents on local search in the last three years alone. As such we  must look at the totality of them and consider whichever current  award we&#8217;re looking at in that context. </p>
<p>Now let&#8217;s take the &#8220;avoid SEO magic bullet&#8221; perspective and  look at some ways to stay on top of things. Some tips to keeping up (rationally) with patents include;</p>
<p><UL>
  <LI>Set up some alerts via RSS with <a href="http://www.latestpatents.com/category/google/" target="_blank">Latest Patents</a>.</li>
<p><LI>Create email alerts at <a href="http://www.freepatentsonline.com/result.html?query_txt=AN+Google&#038;sort=relevance&#038;srch=top&#038;search=" target="_blank">Free Patents Online</a>.</li>
<p><LI>Remember to research the authors. This often gives insight into what they&#8217;ve worked on in the past and offers relevant content.</li>
<p><LI>Always bear in mind we never know the exact uses nor weighting of a given signal in search engine algorithms.</li>
<p><LI>Seek out related patents that can offer context and perspective.</li>
<p><LI>Always check the associated images with the patents; they simplify things.</li>
<p><LI>Read other uber patent geeks such as <a href="http://seobythesea.com" target="_blank">Bill Slawski</a>.</li>
<p></UL></p>
<p>It will take you some time to get used  to reading patents, but with time and practice it does get  easier. The end goal is not as much about figuring out how the search  engine in question is incorporating the patent into its algorithms. It is always about getting into the  mind-set of a search engineer so that you develop a common sense  approach to your own strategies and testing practices.</p>
<h2>SEO Ain&#8217;t Rocket Science, It&#8217;s <i>Computer</i> Science</h2>
<p>The next area we want to get into is  the world of information retrieval. This is part of the computer  science world. If patents are the past, IR world watching is the  future. This is an important part of becoming a super uber search  geek. While there is much you can glean from what&#8217;s already out there  in the search world, in many ways it is about seeing what lies ahead.  When doing SEO you want to always ensure your tactics are &#8220;future  proofed.&#8221; As such, IR watching is paramount to delivering a strategy  that stands the test of time.</p>
<p>To get you rolling with that here is  some essential viewing:</p>
<p><UL>
  <LI>
    <a href="http://www.huomah.com/dojo/videos/viewvideo/52/information-retrieval/sims-141-overview-of-how-search-engines-work.html" target="_blank">How Search Engines Work </a></li>
<p>  <LI>
    <a href="http://videolectures.net/Top/Computer_Science/Machine_Learning/" target="_blank"> Machine Learning Section on VideoLectures.net</a></li>
<p>  <LI>
    <a href="http://videolectures.net/Top/Computer_Science/Natural_Language_Processing/" target="_blank">Natural Language Processing vid</a></li>
<p>  <LI>
    <a href="http://videolectures.net/Top/Computer_Science/Text_Mining/" target="_blank"> Text Mining vids</a></li>
<p>  <LI>
    <a href="http://videolectures.net/Top/Computer_Science/Semantic_Web/" target="_blank">Semantic Web Vids </a></li>
<p></UL></p>
<p>And some essential reading:</p>
<p><UL>
  <LI><a href="http://www.icml2010.org/" target="_blank">ICML</a> – the International Conference on Machine Learning</li>
<p><LI><a href="http://www.sigir.org/" target="_blank">SIGIR</a> – Special Interest Group on Information Retrieval</li>
<p><LI><a href="http://airweb.cse.lehigh.edu/" target="_blank">AIRweb </a>– Adversarial Information Retrieval on the Web</li>
<p><LI><a href="http://research.google.com/" target="_blank">Google Research</a></li>
<p><LI><a href="http://research.microsoft.com/en-us/groups/irm/default.aspx" target="_blank">Microsoft Research</a></li>
<p><LI><a href="http://www.huomah.com/dojo/38-Research-papers/View-category.html" target="_blank">A selected list of IR research papers</a></li>
<p></UL></p>
<p>And of course you can search places such as Stanford for even more.  There is a ton of information out there but the above resources should give  you a starting point for doing some research and reading of your own. </p>
<p><H2>Free Courses And Learning On The Web</h2>
<p><A HREF="http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html" TARGET="_blank">Introduction  to Information Retrieval</A>. This is an online version  of the book from Cambridge University. This book is the result of a  series of courses  taught at Stanford University and at the  University of Stuttgart, in a range of durations including a single  quarter, one semester and two quarters. These courses were aimed at  early-stage graduate students in computer science. </p>
<p><A HREF="http://www.dcs.gla.ac.uk/Keith/Preface.html" TARGET="_blank">Information  Retrieval</A>. A book by <A HREF="http://www.dcs.gla.ac.uk/%7Ekeith/" TARGET="_blank">C.  J. van Rijsbergen</A>. The major change in the second edition of  this book is the addition of a new chapter on probabilistic  retrieval. This chapter has been included because I think this is one  of the most interesting and active areas of research in information  retrieval. There are still many problems to be solved so I hope that  this particular chapter will be of some help to those who want to  advance the state of knowledge in this area. </p>
<p><A HREF="http://www.db.dk/pi/iri/">Information Retrieval  Interaction</A> &#8211; P. Ingwersen. 
Focuses on  user interaction in information retrieval. The aims of the book are to  establish a unifying scientific approach to IR&mdash;a synthesis based  on the concept of IR interaction and the cognitive viewpoint; to  present research and developments in the field of information  retrieval based on a new categorization, and to generate a  consolidated framework of functional requirements for intermediary  analysis and design.</p>
<h2>Are We There Yet?</h2>
<p>These  resources should be enough to get you going and set you on the path  to understanding more about search than your average SEO  practitioner, and  you should be well on  the road to becoming an uber geeky search nut. And please remember,  this is not something to take lightly. The more you dig into the deeper aspects of &#8220;this thing of ours,&#8221; the more prepared you will be. The next  time you read some blog post or attend a seminar you will be far  better equipped to distinguish the signal from the noise. You can look at  the theory and ponder: does this even make sense? You will also be  better prepared to conduct your own testing. How can one properly  test a theory when they don&#8217;t even understand the rudiments of how a  search engine works?</p>
<p>You don&#8217;t have to become a computer scientist to be an SEO. You  don&#8217;t need to get your PhD to get pages to rank well in search results. But if you spend  some time learning more about the very focus of what we do I can  guarantee that you will have a far more profound understanding of the  job than you had previously. </p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/a-tactical-guide-to-becoming-an-seo-ubergeek-52954/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
