<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Engine Land &#187; Todd Nemet</title>
	<atom:link href="http://searchengineland.com/author/todd-nemet/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: News On Search Engines, Search Engine Optimization (SEO) &#38; Search Engine Marketing (SEM)</description>
	<lastBuildDate>Fri, 17 May 2013 21:49:47 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Do As I Say, Not As I Do: A Look At Search Engines &amp; SEO Best Practices</title>
		<link>http://searchengineland.com/do-as-i-say-not-as-i-do-a-look-at-search-engines-seo-best-practices-102698</link>
		<comments>http://searchengineland.com/do-as-i-say-not-as-i-do-a-look-at-search-engines-seo-best-practices-102698#comments</comments>
		<pubDate>Thu, 01 Dec 2011 14:05:32 +0000</pubDate>
		<dc:creator>Todd Nemet</dc:creator>
				<category><![CDATA[All Things SEO]]></category>
		<category><![CDATA[Channel: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=102698</guid>
		<description><![CDATA[Now that the holidays are upon us, we all probably could use some cheering up. So I thought I&#8217;d have some fun with our favorite search engines: Google, Yahoo, Bing, YouTube, and Blekko. At Nine By Blue, I have been developing software that automatically checks sites for technical SEO best practices. Normally we run it [...]]]></description>
				<content:encoded><![CDATA[<p>Now that the holidays are upon us, we all probably could use some cheering up. So I thought I&#8217;d have some fun with our favorite search engines: Google, Yahoo, Bing, YouTube, and Blekko.</p>
<p>At Nine By Blue, I have been developing software that automatically checks sites for technical SEO best practices. Normally we run it on our clients&#8217;s sites to quickly check for issues and monitor them for any future problems.</p>
<p>But I was curious to see what I would find if I pointed the software at some typical pages on the search engines&#8217;s sites and then compare their implementations with the technical SEO best practices that we typically recommend.</p>
<p>Below is a list of some of the issues that I found in no particular order.</p>
<p><strong>Disclaimer #1</strong>: This list is intended to point out how difficult it is to fully optimize a site for SEO, especially large-scale enterprise sites. I&#8217;m not claiming that I could have done any better, even if I had full control of these sites.</p>
<p><strong>Disclaimer #2</strong>: Yes, I&#8217;m aware of <a href="http://googlewebmastercentral.blogspot.com/2010/03/googles-seo-report-card.html">Google&#8217;s SEO report card</a>, but I have never read it because it is too long. Also, I didn&#8217;t want to be influenced by it.</p>
<h2>Use a Link Rel=Canonical Tag On The Homepage</h2>
<p>Most of the sites that I reviewed had many different URLs that lead to the home page. This can be because of tracking parameters (i.e. http://www.site.com/?ref=affilliate1) or default file names (i.e. http://www.site.com/index.php), or even duplicate subdomains (http://www1.site.com/).</p>
<p>Because of this, I always recommend putting a <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=139394">link rel=canonical tag</a> on the home page. This ensures that links to these different home page URLs all get counted as pointing to the same URL. I also recommend adding this tag for any other pages that might have similar issues.</p>
<p>I was surprised to find that Bing was the only site that had a proper link rel=canonical tag on the home page.</p>
<p>YouTube also has a link rel=canonical tag, but it was pointing to an improper URL &#8220;/&#8221; instead of the full URL &#8220;http://www.youtube.com/&#8221;.</p>
<h2>Avoid Duplicate Subdomains &amp; 301 Redirect Them To The Main Subdomain</h2>
<p>With a few exceptions, I have been able to find a duplicate copy of the sites that I review.</p>
<p>I have a list of typical subdomains &#8212; like www1, dev, api, m, etc. &#8212; that will generally turn up a copy of the site. Other duplicate copies of a site can be found at the IP address (i.e. http://192.168.1.1/ instead of http://www.site.com/) and by probing DNS for additional hostnames or domains.</p>
<p>These duplicate subdomains or duplicate sites have a negative effect on SEO because they make the search engines crawl multiple copies of your site just to get one copy. It can also cause links intended for a particular page to be spread out among multiple copies, reducing the page&#8217;s authority.</p>
<p>The best way to fix this is to use a permanent (301) redirect to canonical subdomain&#8217;s version of that URL. If that isn&#8217;t possible, then a link rel=canonical tag pointing to the canonical subdomain page will work almost as well.</p>
<p>For example, an entire duplicate copy of Bing.com is available at <a href="http://www1.bing.com/">http://www1.bing.com/</a>. Compounding this is the fact that the page has a link rel=canonical tag also pointing to <a href="http://www1.bing.com/">http://www1.bing.com/</a> and all the links on the page point to www1 as well.</p>
<p>Other subdomains, such as www2 through www5 and www01, all properly redirect to www.bing.com with a 301.</p>
<p>Blekko has an old, pre-launch copy of its site at <a href="http://api.blekko.com/">http://api.blekko.com/</a>. (Here is their <a href="http://api.blekko.com/mgmt.html">old executive page</a>.) Fortunately, this subdomain has a robots.txt file that is preventing it from being crawled. But these pages, like the old executive page at <a href="http://api.blekko.com/mgmt.html">http://api.blekko.com/mgmt.html</a> is also available at <a href="http://dev.blekko.com/mgmt.html">http://dev.blekko.com/mgmt.html</a> and the main subdomain at <a href="http://blekko.com/mgmt.html">http://blekko.com/mgmt.html</a>.</p>
<p>It would be better to 301 redirect these URLs to the current management page at <a href="http://blekko.com/ws/+/management">http://blekko.com/ws/+/management</a> than to leave multiple copies of them on different subdomains.</p>
<p>YouTube redirects its duplicate subdomains www1 through www5 to www.youtube.com, which is in line with best practices. Unfortunately, it redirects with a 302 (temporary) redirect rather than a recommended 301 (permanent) redirect.</p>
<h2>Use Permanent Redirects From https: URLs To http: URLs IF They Don&#8217;t Require SSL</h2>
<p>Another type of duplicate copy of a site that I usually find is the SSL/https version of the site. https is appropriate for pages that require security, like a login page or a page for editing a user profile, but for pages that don&#8217;t require security, it is a source of duplicate content causing crawl inefficiency and link diffusion.</p>
<p>The recommended solution for this is to redirect pages from https to http whenever possible.</p>
<p>Our software detected duplicate https copies of most pages, including <a href="https://onlinehelp.microsoft.com/en-US/bing/ff808535.aspx">Microsoft&#8217;s help pages</a>, the <a href="https://www.youtube.com/t/about_youtube">YouTube about pages</a>, <a href="https://www.google.com/about/corporate/company/">Google&#8217;s corporate page</a>, and even the <a href="https://www.google.com/support/webmasters/bin/answer.py?answer=35769&amp;hl=en">Google webmaster guidelines</a>.</p>
<p>The duplicate content issue with the Google webmaster guidelines page (and the other Google help pages) is compounded by a link rel=canonical tag that points to either the http or https version of the URL, depending on URL is requested.</p>
<p>It is important to make sure that the link rel=canonical tag always points to the intended canonical version of the page, so be careful when dynamically generating this element.</p>
<p>A request for <a href="https://www.bing.com/">https://www.bing.com/</a> results in a security warning (shown below) due to a mismatched SSL certificate. This is common for sites using Akamai for global server load balancing.</p>
<p>It even pops up for <a href="https://www.whitehouse.gov/">https://www.whitehouse.gov/</a>. I&#8217;m not aware of a way to get around this issue, though I would love to talk with something at Akamai about this.</p>
<p style="text-align: center;"><img class="size-large wp-image-102714 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/akamai-https-error-600x283.png" alt="" width="600" height="283" /></p>
<h2>Use Robots.txt File To Prevent URLs From Being Crawled</h2>
<p>Sites generally have different types of pages that they don&#8217;t want to have search engine&#8217;s index. This could be because these pages are unlikely to convert or aren&#8217;t a good experience for users to land on, like a &#8220;create an account&#8221; or &#8220;leave a comment&#8221; page. Or it could be because the page is not intended for Web browsers, like an XML response to an API call.</p>
<p>Bing&#8217;s search API calls, which are made to URLs starting with http://api.bing.com/ or http://api.bing.net/ can be crawled by spiders according to the robots.txt file. This can be devastating to crawl efficiency because search engines will continue to crawl these XML results even though they are useless to browsers.</p>
<p>A search on Google for <a href="http://www.google.com/webhp?q=site:api.bing.net+OR+site%3Aapi.bing.com">[site:api.bing.net OR site:api.bing.com]</a> currently returns about 260 results, but based on analysis I have done on clients&#8217; Web access log files, it is many times more URLs than these have been crawled and rejected.</p>
<h2>Use ALT Attributes In Images</h2>
<p>Images should always be given alternate text via the ALT attribute (not TITLE or NAME as I have seen on some sites). This is good for accessibility issues like screen readers, and it provides additional context about a page to search engines.</p>
<p>Though many images on the pages that were checked had appropriate alternate text, I couldn&#8217;t help but notice that Duane Forrester&#8217;s image on <a href="http://www.bing.com/community/members/duane-forrester/default.aspx">his profile page</a> didn&#8217;t. But he is in good company because Larry, Sergey, Eric, and the rest of the <a href="http://www.google.com/about/corporate/company/execs.html">Google executive team</a> don&#8217;t either.</p>
<h2>Avoid Use Of Rel=Nofollow Attributes On Links To &#8220;sculpt PageRank&#8221;</h2>
<p>A <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=96569">rel=nofollow</a> attribute on a link tells search engines not to consider the link as part of its link graph. Occasionally, I will review a site that attempts to use this fact to control the way that PageRank &#8220;flows&#8221; through a site.</p>
<p>This technique is generally considered to be ineffective and actually counterproductive, and I always recommend against it. (There are still valid uses for rel=nofollow attributes on internal links, such as link to pages that are excluded from being crawled by robots.txt.)</p>
<p>None of the search engine pages I checked were using rel=nofollow attributes in this way with the exception of the YouTube home page.</p>
<p>In the image below, nofollowed links are highlighted in red. Links to the most viewed and top favorited are being shown to search engines but general music, entertainment, and sports videos are not.</p>
<p style="text-align: center;"><img class="size-full wp-image-102717 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/youtube-pr-sculpt.png" alt="" width="385" height="660" /></p>
<h2>Return Response Codes Directly</h2>
<p>A URL that doesn&#8217;t lead to a valid page should return a 404 (page not found) response code directly.</p>
<p>If an invalid URL is sent to Bing&#8217;s community blog site, it will redirect to a 404 page. Here is the chain:</p>
<ol>
<li>The URL <a href="http://www.bing.com/community/b/nopagehere.aspx">http://www.bing.com/community/b/nopagehere.aspx</a> returns a 302 (temporary) redirect to</li>
<li>the URL <a href="http://www.bing.com/community/error-notfound.aspx?aspxerrorpath=/community/b/nopagehere.asp">http://www.bing.com/community/error-notfound.aspx?aspxerrorpath=/community/b/nopagehere.asp</a>, which returns a 404 (page not found) response.</li>
</ol>
<p>The recommended best practice would be for the first URL to return a 404 directly. If that isn&#8217;t possible, then the redirect should be changed to a 301 (permanent) redirect.</p>
<p>Yahoo&#8217;s corporate information pages do something interesting when they get an invalid URL.</p>
<p>A request to <a href="http://info.yahoo.com/center/us/yahoo/anypage.html">http://info.yahoo.com/center/us/yahoo/anypage.html</a>, which is not a valid URL, correctly returns a 404 (page not found) response.</p>
<p>But the 404 page contains an old school <a href="http://en.wikipedia.org/wiki/Meta_refresh">meta refresh</a> with a time of one second that redirects to <a href="http://info.yahoo.com/center/us/yahoo/">http://info.yahoo.com/center/us/yahoo/</a>.</p>
<p>A 301 redirect to this page is the recommended way to handle these types of invalid URLs.</p>
<h2>Support If-Modified-Since/Last-Modified Conditional GETs</h2>
<p>I am a big fan of using cache control headers to increase crawl efficiency and decrease page speed. (My article on this topic is <a href="http://searchengineland.com/how-to-improve-crawl-efficiency-with-cache-control-headers-88824">here</a>.)</p>
<p>I found it interesting that out of all the URLs that were checked only a few Google URLs supported If-Modified-Since requests and none of the URLs supported If-None-Match.</p>
<h2>Periodically Check Your DNS Configuration</h2>
<p>As part of a site review, I like to use on-line resources like <a href="http://intodns.com/">http://intodns.com/</a> and <a href="http://robtex.com/">http://robtex.com/</a> to check the DNS configuration.</p>
<p>DNS is an important part of technical SEO because if something breaks with DNS, then the site will go down and it isn&#8217;t going to get crawled. Fortunately, this rarely happens.</p>
<p>However, I have reviewed sites that had their crawling affected by DNS changes. And I have reviewed several large sites that had their DNS servers on the same subnet, essentially creating a single point of failure for their entire business.</p>
<p>As expected, all the search engines had no serious DNS issues. I was surprised to see that two of them had recursion enabled on their name servers because in some rare instances that can be a security risk.</p>
<p>My recommended best practice is to run these types of checks at least once a quarter.</p>
<h2>Conclusion</h2>
<p>These are a few of the issues that were found that I commonly see or think are important. There were others, but they were relatively minor or subtle things like short titles, duplicate/missing meta descriptions, missing headers, and too many static resources per page.</p>
<p>Normally, I would have access to Web access log files and webmaster tools, which allows our software to check a lot more things.</p>
<p>I hope this gives you some ideas for things to check on your own site. And I hope that when you find something that you realize that even the search engines have their own technical SEO issues from time to time.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/do-as-i-say-not-as-i-do-a-look-at-search-engines-seo-best-practices-102698/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Clickthrough Rate Equation In Organic Search, Part Two</title>
		<link>http://searchengineland.com/the-clickthrough-rate-equation-in-organic-search-part-two-95309</link>
		<comments>http://searchengineland.com/the-clickthrough-rate-equation-in-organic-search-part-two-95309#comments</comments>
		<pubDate>Thu, 03 Nov 2011 16:24:00 +0000</pubDate>
		<dc:creator>Todd Nemet</dc:creator>
				<category><![CDATA[All Things SEO]]></category>
		<category><![CDATA[Channel: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=95309</guid>
		<description><![CDATA[In last month&#8217;s post, I talked about how improving organic clickthrough rate multiplies the effectiveness of the other work that goes into optimizing a website for search, such as keyword research, SEO, and usability. Most of these ways of increasing clickthrough rate are directly in our control by tweaking the on-page code. I ended by covering [...]]]></description>
				<content:encoded><![CDATA[<p>In last month&#8217;s post, I talked about how <a title="The Clickthrough Rate Equation In Organic Search" href="http://searchengineland.com/considering-clickthrough-rate-95277">improving organic clickthrough rate</a> multiplies the effectiveness of the other work that goes into optimizing a website for search, such as keyword research, SEO, and usability. Most of these ways of increasing clickthrough rate are directly in our control by tweaking the on-page code.</p>
<p>I ended by covering the two most important components of the search result: titles and snippets.</p>
<p>In this post, I&#8217;m going to cover some of the other search result components that can also improve clickthrough rate.</p>
<h2>The Green Text</h2>
<p><BR></p>
<h3>URLs</h3>
<p><BR>
I have noticed that some sites like to put lots of keywords in their URLs so that they will show up in the search results. (And possibly because they believe is helps with ranking, which is a separate issue.) Using keyword-rich URLs is fine as long as you take the following into consideration:</p>
<ul>
<li>Don&#8217;t do this if your URL path elements are actually URL query parameters.</li>
<ul>
<li>For example, you have a URL like <a href="http://www.example.com/">http://www.example.com/t-shirt-id/1234/page/4</a> that was rewritten from a URL like <a href="http://www.example.com/">http://www.example.com/product.php?t-shirt-id=1234&amp;page=4</a>. If you do, you are risking serious crawl efficiency issues because search engines can&#8217;t normalize path elements the way that they can with query parameters.</li>
</ul>
<li>Make sure that you aren&#8217;t inadvertently causing any case-insensitivity issues or duplicate content issues.</li>
<ul>
<li>I see a lot of sites that will return the same page for a URL like <a href="http://www.example.com/">http://www.newssite.com/it-doesn&#8217;t-matter-what-you-put-here-12345</a> and the real canonical URL like <a href="http://www.example.com/">http://www.newssite.com/kim-kardashian-files-for-divorce-12345</a>. Be sure to use a 301 redirect or at least a link rel=canonical URL to normalize pages like these.</li>
</ul>
<li>Don&#8217;t change all of the URLs on your site just for the sake of putting keywords in them. A significant site re-architecture like that is difficult to pull off without any hiccups.</li>
</ul>
<div>Here is an example URL from a search for [xkcd t-shirts] that contains keywords in the URL:</div>
<div><img class="size-full wp-image-99329 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/xkcd-tshirts-inurl-example.png" alt="" width="543" height="87" /></div>
<p><BR></p>
<h3>Breadcrumbs</h3>
<p><BR></p>
<div>I think a far better way to get relevant keywords into a search result is by using breadcrumbs. Here are two more example search results for the same query:</div>
<p style="text-align: center;"><img class="size-full wp-image-99330 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/xkcd-breadcrumb-example1.png" alt="" width="460" height="68" /></p>
<p style="text-align: center;"><img class="size-full wp-image-99331 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/xkcd-t-shirt-breadcrumb-example2.png" alt="" width="531" height="88" /></p>
<p>These breadcrumbs are great not only because they contain relevant keywords, but also because they give a sense of how the page you are thinking about clicking on fits in to the rest of the site. This will make it easier for users to navigate your site and make it more likely for them to convert.</p>
<p>Here are the corresponding breadcrumbs on the pages from the two search results above:</p>
<p><strong>Thinkgeek.com:</strong></p>
<p style="text-align: center;"><img class="size-full wp-image-99332 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/breadcrumb-example-xkcd-tees.png" alt="" width="363" height="77" /></p>
<p><strong>Redbubble.com:</strong></p>
<p style="text-align: center;"><img class="size-full wp-image-99333 aligncenter" style="border-style: initial; border-color: initial;" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/breadcrumb-example2.png" alt="" width="289" height="47" /></p>
<p>It isn&#8217;t possible to put together just any set of links and have search engines pick them up. At a minimum the links and link text need to:</p>
<ul>
<li>be canonical</li>
<li>be relevant</li>
<li>be short (no more than 3 or 4 words)</li>
<li>most importantly, represent the actual navigable hierarchy of the site.</li>
</ul>
<p>Google and Bing list their recommended best practices for breadcrumbs and describe the mark up language on this <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=185417">Google help page</a> and this <a href="http://onlinehelp.microsoft.com/en-us/bing/hh207240.aspx">Bing help page</a>. Both support microdata and RDFa. Schema.org also has support for a <a href="http://schema.org/WebPage">breadcrumb property</a> if you are throwing in with microformats.</p>
<h2>Structured Markup</h2>
<p><BR></p>
<h3><strong>RDFa, Microformats, Microdata</strong></h3>
<p><BR>
Structured markup can be used to explicitly indicate specific types of data to search engines. According to my notes from SMX East in September, these are supported:</p>
<ul>
<li><strong>Bing</strong> and <strong>Google</strong>: reviews, people, recipes</li>
<li><strong>Google</strong>: products, events, music, and apps</li>
<li><strong>Yahoo</strong>, <strong>Bing</strong>, and <strong>Google</strong>: <a href="http://schema.org/">Schema.org</a>, which has a zillion types of data to annotate but which has limited support currently because it was recently <a href="http://www.bing.com/community/site_blogs/b/search/archive/2011/06/02/bing-google-and-yahoo-unite-to-build-the-web-of-objects.aspx">announced</a> in June of this year.</li>
</ul>
<div>Here is an example showing rich snippet mark up for a product with reviews on Amazon:</div>
<div><img class="aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/cosmos-reviews.png" alt="" width="497" height="81" /></div>
<p>Every site I have spoken with or that has presented at a session I&#8217;ve attended has indicated a large increase in click through rate after implementing their markup, especially for reviews and recipes. (One example: Topher Kohan of CNN mentioned at SMX East that adding hRecipe markup to one of their sites resulted in a 22% increase in traffic.)</p>
<p>Selecting the right type of markup and implementing it is an entire post in itself, so I&#8217;m going to recommend that if you have content of a type listed above, you should read through <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=99170">Google&#8217;s help article on rich snippets and structured data</a> and the <a href="http://schema.org/">schema.org</a> site.</p>
<p>Also, check out this <a href="http://searchengineland.com/employing-microformats-structured-data-for-enhanced-search-engine-visibility-94122">great article by Aaron Bradley</a> that gets into potential relevancy effects of marking up your pages with structured data.<BR></p>
<h3>Rel=author/me attributes</h3>
<p><BR>
Indicating the author with structured markup on an article or blog post shows a profile picture along with a link to the author&#8217;s Google Plus profile page.</p>
<p style="text-align: center;"><img class="size-full wp-image-99546 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/rel-me-example.png" alt="" width="517" height="113" /></p>
<p>Setting this up requires a few steps that weren&#8217;t immediately clear to me, although Rick DeJarnette explained it well in <a href="http://searchengineland.com/how-to-create-your-digital-footprint-with-links-89205">How To Create Your Digital Footprint With Links</a>, it involves setting attributes on three links:</p>
<ul>
<li><em>rel=&#8221;author&#8221;</em> on the link from the article to your general author page (for example, <a href="http://searchengineland.com/author/danny-sullivan">http://searchengineland.com/author/danny-sullivan</a>)</li>
<li><em>rel=&#8221;me&#8221;</em> on the link from your general author page to your Google Profile page (<a href="http://www.example.com/">https://profiles.google.com/&lt;big-long-number&gt;</a>)</li>
<li><em>rel=&#8221;me&#8221; </em>or<em> rel=&#8221;contributor-to&#8221;</em> on the link from your Google profile page to your general author page. To do this find your Google profile, click edit profile, and edit &#8220;Contributor to&#8221; to add a link to your general author page.</li>
</ul>
<h2>Sitelinks</h2>
<p><BR>
Sitelinks are the block of related extra links that show up under a top search result. It&#8217;s a good idea to check these sitelinks periodically by searching for your most popular branded searches on Google and Bing.</p>
<p>If you see links you don&#8217;t like on Google, you can &#8220;demote&#8221; them by logging into Google Webmaster Tools and going to Site configuration &gt; Sitelinks. The demotion will only last for 90 days.</p>
<p>As motivation to check your sitelinks, here is an unfortunate set of sitelinks that I found last week when trying reset my Starbucks account password:</p>
<p style="text-align: center;"><img class="size-full wp-image-99337 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/starbucks-account-sitelinks.png" alt="" width="555" height="297" /></p>
<p>(Aside to anyone at Starbucks: I&#8217;m pretty sure this is happening because of the way your site returns a 200 and redirects for certain types of &#8220;page not found&#8221; pages. Contact me, and I&#8217;ll send you more information. By the way, I will work for coffee.)</p>
<p>Sitelinks can also occur within search results, not just at position one. For example, these two search results for the query [ancient egypt] show up with their own abbreviated sitelinks:</p>
<p style="text-align: center;"><img class="size-full wp-image-99340 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/sitelinks-in-pos1-and-2.png" alt="" width="528" height="221" /></p>
<p>The standard advice for getting sitelinks to show up &#8212; again from my SMX East notes &#8212; is to make sure they are &#8220;prominent links on your site.&#8221; This <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=47334">Google help article</a> also recommends making sure the links have anchor text that is &#8220;informative, compact, and avoids repetition.&#8221;</p>
<p><strong>Table of content links within the same page</strong></p>
<p>If your site has a lot of long, technical articles or other well-structured content that generally lends itself to having a table of contents, using <a href="http://www.w3.org/TR/html4/intro/intro.html#fragment-uri">fragment identifiers</a> (also called <a href="http://googlewebmastercentral.blogspot.com/2009/09/using-named-anchors-to-identify.html">named anchors</a>) is a really great way to get additional links with keywords to show up in search results.</p>
<p>Here is an example from the query [exoplanet gravitational microlensing]:</p>
<p style="text-align: center;"><img class="size-full wp-image-99341 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/jump-to-anchors.png" alt="" width="511" height="81" /></p>
<p>Bing also has support for this as seen from this search for [ancient egypt]:</p>
<h2><img class="size-full wp-image-99555 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/bing-sitelinks.png" alt="" width="574" height="90" /></h2>
<p>To increase the chances of having these show up make sure your pages are well-structured, the anchors have descriptive text, and that the pages have a table of contents with links to each individual anchor.</p>
<p>The table of contents containing the fragments doesn&#8217;t have to take up a lot of space on the page. Here is an example from a professor&#8217;s personal site that I thought was interesting:</p>
<p style="text-align: center;"><img class="aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/semiconductor-quantum-mechanics.png" alt="" width="513" height="87" /></p>
<p>This is the section of the page containing the table of contents:</p>
<h2><img class="aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/index-not-toc.png" alt="" width="706" height="53" /></h2>
<h2>Miscellaneous Tips</h2>
<p><BR></p>
<h3>Rank higher</h3>
<p><BR>
Ranking higher in the search result pages will result in a higher clickthrough rate, but that&#8217;s out of our direct control and a little beyond the scope of this post.
<BR>
<h3>Character encoding</h3>
<p><BR>
Occasionally, I see a site with character encoding issues. Usually it results from having the server configured for one character encoding while the page templates and/or the underlying database are configured with different character encoding.</p>
<p>Aside from server configuration issues, I&#8217;ve seen this happen with sites that include data from 3rd party sources with varying character encoding and when documents are copied and pasted from Word directly into webpages.</p>
<p>If character encoding issues surface on your site, it will definitely reduce click through. Compare this result:</p>
<p style="text-align: center;"><img class="size-full wp-image-99344 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/good-character-encoding.png" alt="" width="509" height="81" /></p>
<p>with this one:</p>
<p style="text-align: center;"><img class="size-full wp-image-99345 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/bad-character-encoding.png" alt="" width="525" height="80" /></p>
<p>I faked this one by deliberately setting my browser to the wrong character encoding, but I have seen issues like this on sites. Generally, I recommend doing everything in UTF-8 as much as possible.<BR></p>
<h3>Instant Previews (Google)</h3>
<p>In November 2010 Google started showing instant previews, which pops up a preview of the web page in the search results when you hover over the result. The <a href="http://googleblog.blogspot.com/2010/11/beyond-instant-results-instant-previews.html">announcement</a> makes the claim that people who use them are &#8220;5% more likely to be satisfied with the results that they click.&#8221; We&#8217;ll take it.</p>
<p>You can test out your instant previews in Google Webmaster Tools at Labs &gt; Instant Previews. There you can find out if Google is able to pre-render its instant previews or if it has to generate them on the fly. You can also see what your instant previews on mobile search look like.</p>
<p>If your CSS and JavaScript files are robotted out, like they are in Search Engine Land, Google will have to generate the preview on the fly, and you will see something like this in Google Webmaster Tools:</p>
<p style="text-align: center;"><img class="size-large wp-image-99534 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/SEL-instant-previews-600x402.png" alt="" width="600" height="402" /></p>
<p>Notice how the one on the right has no formatting, like it&#8217;s a text-only cached version of the page. I didn&#8217;t notice any delay when viewing Search Engine Land&#8217;s instant preview, but I would still recommend that Google be allowed by pre-render these instant previews.</p>
<p>For more information check out Google&#8217;s <a href="https://sites.google.com/site/webmasterhelpforum/en/faq-instant-previews">very useful FAQ </a>on instant previews, which is on a separate Google Sites page for some reason.</p>
<h2>Social Signals</h2>
<p><BR>
This is another area that is out of our direct control, but it shows some of the benefits that a good social media program can have on an organic campaign. Having friends and colleagues recommend links that show up in your search results can only increase clickthrough rate.</p>
<h3>Bing integration with Facebook</h3>
<p><BR>
Bing has excellent integration with Facebook, which annotates your search results with friends who have recommended the same pages. As an example, on a Bing search for [bay area college radio], I see that four of my friends recommend the <a href="http://kfjc.org/">venerable college station KFJC 89.7</a>.</p>
<p style="text-align: center;"><img class="size-full wp-image-99540 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/bing-kfjc-fb.png" alt="" width="589" height="115" /></p>
<h3>Google integration with everything but Facebook</h3>
<p><BR>
With Google, depending on how the person who is searching has filled out his or her profile, you can get recommended results from Google+, Twitter, Blogger, and Buzz. I have even seen results that were recommended to me because someone I am linked to via Gmail shared it.</p>
<p>A recommendation from Blogger showing up in a search for [kfjc]:</p>
<p style="text-align: center;"><img class="size-full wp-image-99544 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/google-blogger-share.png" alt="" width="536" height="119" /></p>
<p>A recommendation from Google+ showing up in a search for [google profile]:</p>
<p style="text-align: center;"><img class="size-full wp-image-99545 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/11/google-google-plus-share.png" alt="" width="536" height="149" /></img></p>
<h2>Conclusion</h2>
<p>I hope that this quick run through of different techniques that can affect how your pages show up in search results &#8212; URLs, breadcrumbs, structured markup, author tagging, sitelinks, named anchors, instant previews, correcting character encoding issues, and social signals &#8212; gives you at least a few ideas of how to increase your site&#8217;s clickthrough rate, which will multiply the effects of all the other optimizations you are doing on your site.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/the-clickthrough-rate-equation-in-organic-search-part-two-95309/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Clickthrough Rate Equation In Organic Search</title>
		<link>http://searchengineland.com/considering-clickthrough-rate-95277</link>
		<comments>http://searchengineland.com/considering-clickthrough-rate-95277#comments</comments>
		<pubDate>Thu, 06 Oct 2011 18:44:14 +0000</pubDate>
		<dc:creator>Todd Nemet</dc:creator>
				<category><![CDATA[All Things SEO]]></category>
		<category><![CDATA[Channel: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=95277</guid>
		<description><![CDATA[When I was in middle school, my favorite book and my favorite TV show were both Cosmos by Carl Sagan. I must have read the book at least 10 times, and I watched the series every time it was on the local PBS station. One of the most interesting parts of Cosmos that has stuck [...]]]></description>
				<content:encoded><![CDATA[<p>When I was in middle school, my favorite book and my favorite TV show were both Cosmos by Carl Sagan. I must have read the book at least 10 times, and I watched the series every time it was on the local PBS station.</p>
<p>One of the most interesting parts of Cosmos that has stuck with me is the <a title="Wikipedia article" href="http://en.wikipedia.org/wiki/Drake_equation">Drake equation</a>:</p>
<p style="text-align: center;"><img class="size-full wp-image-95284 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/Drake-equation.png" alt="The drake equation" width="445" height="59" /></p>
<p>The Drake equation is an attempt to estimate the current number of intelligent civilizations in the Milky Way by breaking it down into component parts (such as &#8220;f(p), the fraction of stars that have planets&#8221; and &#8220;f(l), the fraction of planets capable of sustaining life&#8221;) and then multiplying them all together.</p>
<p>You can watch <a href="http://www.youtube.com/watch?v=MlikCebQSlY">Dr. Sagan explain the Drake equation on YouTube</a>. He pessimistically puts N at 10 (the early 80&#8242;s were a bummer, man) but then upgrades it to &#8220;millions&#8221; a less than a minute later (short term memory is also a bummer, man).</p>
<p>As I was talking to someone at SMX East a few weeks ago, it occurred to me that measuring conversions from organic search could be expressed similarly to the Drake equation like this:</p>
<p style="text-align: center;"><img class="size-full wp-image-95285 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/SEO-equation.png" alt="SEO Drake equation" width="346" height="64" /></p>
<p>In this version, C is the number of conversions, N(k) is the number of people searching for a keyword (or <a href="http://searchengineland.com/tricks-for-taming-keywords-with-regular-expressions-91584">a group of keywords</a>), f(I) is the fraction of searches where one or more links from your site show up (also called an <em>impression</em>), f(CTR) is the clickthrough rate from the search engine results, and f(conv) is the fraction of people who convert after clicking through.*</p>
<p>Then it occurred to me that a lot of attention is paid to three of these terms. Roughly speaking, N(k) is covered by keyword research, f(I) is a major goal of SEO, and f(conv) is in the realm of usability and graphic design.</p>
<p>Relative to the other three terms, clickthrough rate doesn&#8217;t get a lot of attention or optimization even though gains in clickthrough rate multiplies the effectiveness of these other factors. This is odd considering that most of the factors influencing CTR are within our direct control and won&#8217;t affect usability of the website at all.</p>
<p>So if we consider CTR as a highly leveraged but undervalued factor in converting users from organic search and one that is largely within our control, it is probably worth a column or two to take a high-level look at the various ways we can influence it.</p>
<p>The rest of this article only covers the title and snippet in search results and how they affect clickthrough rate. Next month in this column, I&#8217;ll cover many more.</p>
<p><span style="font-size: 13px; font-weight: normal;">The <a href="http://searchengineland.com/the-anatomy-of-a-google-search-result-12792">basic components of the search result</a> is covered in </span>an article by Vanessa Fox<span style="font-size: 13px; font-weight: normal;">, so go check that out if you need a refresher or if you find some of the terminology is unclear to you.</span></p>
<h2>Title &amp; Meta Description</h2>
<p style="text-align: center;"><img class="size-full wp-image-95296 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/Drake-SERP.png" alt="" width="504" height="82" /></p>
<p>&nbsp;</p>
<p>The most visible and largest components of the typical search engine result are the title and snippet. The title is generally taken from the HTML title tag of the page. The snippet can be taken from several sources, but ideally it comes from a well-written meta description tag.</p>
<p>Note that both the title tag and meta description aren&#8217;t generally visible when viewing the page in a browser (especially with the number of tabs I usually have open). This gives a lot of latitude in influencing the search engine results display, but it also gives enough rope to hang yourself if you aren&#8217;t careful.</p>
<h2><strong>Search Engines Overriding Titles &amp; Meta Descriptions</strong></h2>
<p>In the example above, the snippet is pretty good. It&#8217;s descriptive, and ultimately was the result that I clicked on to refresh my memory on the topic.</p>
<p>However, when I checked the source code of the page to see if the snippet was pulled from the meta description this is what I found:</p>
<p>&nbsp;</p>
<p style="text-align: center;"><img class="size-full wp-image-95297 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/bad-meta-description.png" alt="Non-ideal meta description" width="574" height="41" /></p>
<p>&nbsp;</p>
<p>So while the title is coming directly from the page, the meta description clearly is not. This is some boilerplate text left in the page template. Because this text probably appears in many pages on the site and because it is clearly unrelated to the content of the page and because it&#8217;s too short, Google generated the snippet for this result from text on the page.</p>
<p>Usually the results aren&#8217;t that good, which is why it is important to pay attention to the meta descriptions of each page. Here are some of the other results for the same query, none of which gives me a good sense of the page:</p>
<p style="text-align: center;"><img class="size-full wp-image-95298 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/bad-snippets-generated-by-google.png" alt="Bad snippets generated by Google" width="527" height="259" /></p>
<p>&nbsp;</p>
<p>Look at it this way: If you wouldn&#8217;t let a computer write your AdWords ads, then you shouldn&#8217;t allow a computer to write snippets for your site.</p>
<p>From the sites I&#8217;ve evaluated for clients, duplication of titles and meta descriptions are the main reason that they are ignored by Google or Bing, so it&#8217;s important to take care to make these unique for each page.</p>
<p>In the SMX East session about rich snippets, Jack Menzel from Google listed some additional reasons that Google might overwrite the title in a search result:</p>
<ul>
<li>The title is &#8220;unclear based on the query.&#8221; (I&#8217;m taking this to mean that important keywords are missing in the title.)</li>
<li>If the title is missing the company or site name, Google may tack it on the end.</li>
<li>If the title is &#8220;overoptimized&#8221; with keywords, Google may remove a few of them.</li>
</ul>
<p>Jack was careful to point out that Google will only modify the title when they believe it is beneficial to users, but again, I think it&#8217;s important to retain as much control as possible over the way your pages are displayed in search results.</p>
<p>Another issue with duplication is the special case when the title and snippet generated are both identical. When this happens, Google will only show one result, suppress the rest, and show this message at the bottom of the search results:</p>
<p><img class="size-full wp-image-95299 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/dedupe-message.png" alt="Google deduplication message" width="510" height="68" />
<p
<p>This is a depressing message because it means that there are pages from your site that ranked for the query but won&#8217;t be shown because Google couldn&#8217;t differentiate it from the other page that ranked. (This message could also be an issue that your site has pagination issues, which should be <a href="http://searchengineland.com/google-provides-new-options-for-paginated-content-92906">dealt with accordingly</a>.)</p>
<p><h2>Placement Of Keywords Within Titles</h2>
<p>When people are reading through the search results and deciding which one to click on, they are acting more like a monkey scanning a tree for fruit than someone sitting down with a glass of wine and a copy of Ulysses to ponder the classics.</p>
<p>This means people are scanning for the keywords that are already in their working memory (the current search), or &#8212; according to some theories &#8212; scanning for the general shapes of these keywords.</p>
<p>If you combine this observation with eye-tracking studies that show how people&#8217;s eyes trace around on a typical search engine result page, like <a href="http://googleblog.blogspot.com/2009/02/eye-tracking-studies-more-than-meets.html">this one</a> and <a href="http://www.usercentric.com/news/2011/01/26/eye-tracking-bing-vs-google-second-look">this one</a> and <a href="http://ask.enquiro.com/2010/eye-tracking-google-instant/">this one</a>, then logically follows that important keywords should be put at the beginning of the title where they are more likely to be seen by the monkey-scanners.</p>
<p>(I have heard arguments against putting keywords on the left, but I&#8217;ll leave this discussion for people who are more interested in human psychology than I am.)</p>
<h2>Thoughts On Scale For Larger Sites</h2>
<p>For sites with hundreds of thousands of pages, obviously it&#8217;s not possible to write unique and meaningful titles and meta descriptions by hand.</p>
<p>It&#8217;s okay to automatically generate these in a way that strongly encourages click through using metadata for the item(s) the page is about.</p>
<p>Here is an example I came across recently:</p>
<p style="text-align: center;"><img class="size-full wp-image-95306 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/10/good-scalable-snippet.png" alt="" width="526" height="81" /></p>
<p>If I were looking for a home in Willow Glen, it would be hard for me <em>not</em> to click on this result. It&#8217;s clearly generated automatically from an application database but in a way that&#8217;s unique and designed to encourage clickthrough.</p>
<p>In a future article, I&#8217;ll cover other factors that can affect click through rate, like URLs, breadcrumbs, structured metadata, anchors, social signals, character encoding, the phase of the moon, etc&#8230;</p>
<p>*After writing this post, I realized that this is similar to the <a href="http://books.google.com/books?id=WeUadFeW498C&amp;lpg=PA98&amp;dq=%22figure%204.4%20searcher%20persona%20workflow%22%20vanessa%20fox&amp;pg=PA98#v=onepage&amp;q&amp;f=false">searcher persona workflow</a> as described by Vanessa Fox in her book <em>Marketing In The Age Of Google,</em> so check out that excellent book to explore this concept further .</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/considering-clickthrough-rate-95277/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Tricks For Taming Keywords With Regular Expressions</title>
		<link>http://searchengineland.com/tricks-for-taming-keywords-with-regular-expressions-91584</link>
		<comments>http://searchengineland.com/tricks-for-taming-keywords-with-regular-expressions-91584#comments</comments>
		<pubDate>Thu, 08 Sep 2011 13:30:07 +0000</pubDate>
		<dc:creator>Todd Nemet</dc:creator>
				<category><![CDATA[All Things SEO]]></category>
		<category><![CDATA[Channel: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=91584</guid>
		<description><![CDATA[So far my articles about technical SEO have focused on how to adjust a site&#8217;s configuration or architecture to make it more crawlable and indexable. In this post, I&#8217;m writing about the other end of the technical SEO process: using analytics data to analyze traffic and user behavior by keywords. When looking at keyword data, [...]]]></description>
				<content:encoded><![CDATA[<p>So far my articles about technical SEO have focused on how to adjust a site&#8217;s configuration or architecture to make it more crawlable and indexable. In this post, I&#8217;m writing about the other end of the technical SEO process: using analytics data to analyze traffic and user behavior by keywords.</p>
<p>When looking at keyword data, it&#8217;s important to group them by type. Looking at individual keywords is not only inefficient, but it will generally lead to information that is either misleading or worse,  can&#8217;t be acted on.</p>
<p>The most precise way to group keywords is by using regular expressions. Regular Expressions are strings containing letters, numbers, and special characters that match a specific word or group of words.</p>
<p style="text-align: center;"><img class="size-full wp-image-91588 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/09/Screen-Shot-2011-09-03-at-1.19.32-AM.png" alt="Python window with regular expression examples" width="507" height="230" /></p>
<p>Excellent tutorials for regular expressions are all over the Web, so I&#8217;m not going to include an overview here. Instead, I&#8217;ll present a few common recipes that I hope people will find useful and instructive. (Besides, because it has been scientifically proven that <a href="http://www.nytimes.com/2005/12/13/science/13essa.html">people learn mainly by imitation</a>.)</p>
<p>If you&#8217;d like to see some tutorials, <a href="http://www.regular-expressions.info/quickstart.html">this</a> is an excellent one, and the Google Analytics help page for regular expressions is <a href="http://www.google.com/support/analytics/bin/answer.py?answer=55582">here</a>. SEOMoz recently posted a good overview <a href="http://www.seomoz.org/blog/an-seos-guide-to-regex">here</a>.</p>
<h2>Using Regular Expressions Within Google Analytics</h2>
<p>I&#8217;m going to focus on search keywords using Google Analytics because it has the best support for regular expressions. Other analytics packages I have worked with support most of these concepts if not exactly the same syntax. Excel&#8217;s support for matching keywords out of the box is <a href="http://support.microsoft.com/kb/214138">pretty thin</a>, but it appears to be possible to <a href="http://support.microsoft.com/kb/818802/en-us">configure it to use regular expressions</a>.</p>
<p>I didn&#8217;t want to show any data from my clients, so I asked my friends at Google to give me access to Search Engine Land&#8217;s Google Analytics account.* I&#8217;ll be using searchengineland.com data in my examples below.</p>
<p>To get to the organic keywords in the new interface, search for &#8220;organic&#8221; in the Find A Report&#8230; box:</p>
<p style="text-align: center;"><img class="size-full wp-image-91724 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/09/find-organic-box.png" alt="" width="308" height="179" /></p>
<p>Or, browse to Traffic Sources &gt; Sources &gt; Search &gt; Organic:</p>
<p style="text-align: center;"><img class="size-full wp-image-91725 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/09/browse-organic-report.png" alt="" width="199" height="556" /></p>
<h2>Branded Keywords</h2>
<p>The most important regular expression to nail down is the pattern for branded keywords. User behavior for queries involving brand terms is going to be quite different than other queries. Branded search traffic tends to have a lower bounce rate, fewer new users, and a longer time on site.</p>
<p>So metrics for a group of keywords will be much more meaningful if you can exclude (or only include) queries containing branded terms.</p>
<p>To create the branded terms regular expression, I like to bring up the organic keyword report and try out a bunch of regular expressions, iterating slightly with each try.</p>
<p>The new Google Analytics interface doesn&#8217;t accept regular expressions by default, so it&#8217;s necessary to click on the &#8220;advanced&#8221; link next to the search box and select &#8220;Matching RegExp&#8221; from the drop down:</p>
<p style="text-align: center;"><img class="size-large wp-image-91728 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/09/search-regular-expression-600x195.png" alt="" width="600" height="195" /></p>
<p>Now we are ready to start testing keywords, starting with &#8221;<em>search engine land&#8221;.</em></p>
<p>This gets a lot of queries, but when I exclude that pattern, selecting &#8220;Exclude&#8221; from the dropdown to the left of Keyword, I see that I have missed a lot of other branded keywords.</p>
<p>The next iteration is:</p>
<p style="text-align: center;">&#8220;<em>search ?engine ?land&#8221;</em></p>
<p>The <strong><em>?</em></strong> means &#8220;0 or 1 of the previous character.&#8221; Now, the pattern matches whether or not spaces are included. This change nets an additional 15k visits for the time period that I selected.</p>
<p>I notice that many people are spelling search &#8220;serach,&#8221; so the next iteration is:</p>
<p style="text-align: center;"><em>se(ar|ra)ch ?engine ?land</em></p>
<p>The parentheses/bar combination will match either option. This matches 118 more visits.</p>
<p>Unfortunately, my pattern is matching the website address searchengineland.com, which I want to exclude because that traffic is basically direct traffic.</p>
<p>First, I try to exclude a period at the end of the pattern with <em>search ?engine ?land[^.]</em>, but this is no good because it excludes 99% of the visits that I wanted to include.</p>
<p>(Square brackets will match any of the characters listed, but if the first character is ^ then it will match anything but those characters.)</p>
<p>What I am trying to do is to match &#8220;any character that isn&#8217;t a period or the end of the query.&#8221; I can express this with<em> search ?engine ?land([^.]|$)</em>.</p>
<ul>
<li>$ is a special character meaning &#8220;the end of the string.&#8221;</li>
</ul>
<p>This matches fewer visits, but I am now able to exclude queries for the website URL.</p>
<p>When excluding branded queries in combination with other regular expressions, <em>se(ar|ra)ch ?engine ?land </em>is probably a better choice.</p>
<p>Now it is possible to compare the behavior of users who come to Search Engine Land from a branded versus an unbranded query. What I see is pretty typical for the sites that I work with.</p>
<p>Compared with visits from unbranded queries, visits from branded queries:</p>
<ul>
<li>Are three times more likely to be new visitors</li>
<li>Spend five times as much time on site</li>
<li>Have one-half the bounce rate</li>
<li>View about twice as many pages per visit</li>
</ul>
<p>In a pinch for tools with less sophisticated search, such as the Google Webmaster Tools query report or Excel, I would just use <em>land</em> to get a rough approximation.</p>
<p>Next, I&#8217;m curious about queries for search engines. This is easy to do with something like <em>google|yahoo|bing</em>. It isn&#8217;t always necessary to spell out the entire word if people are likely to misspell it.</p>
<p>For example, Baidu is searched for via three spellings (which I got by searching for <em>^b.*d[ou]$</em>):</p>
<p style="text-align: center;"><em>baidu, bai du, bidu</em></p>
<p>I can easily match any of those with <em>ba?i ?du</em>. So, I update my regex to:</p>
<p style="text-align: center;"> <em>google|yahoo|bing|ba?i ?du</em></p>
<p>Oops! I forgot Blekko!</p>
<p style="padding-left: 30px; text-align: center;"><em>google|yahoo|bing|ba?i ?du|blek</em></p>
<p>Another useful group of searches is for stock symbols. But the problem with <em>goog</em> is that it will match both &#8220;Google&#8221; and &#8220;GOOG.&#8221;</p>
<p>Here, it is necessary to use the very handy but somewhat obscure <em>\b</em>, which means &#8220;blank space, but only at the boundary of a word&#8221; or more simply &#8220;word break.&#8221;</p>
<p>So, I could use <em>\b(goog|yhoo|msft|bidu)\b </em>to match a group of stock symbols.</p>
<p>I would also track metrics for social networking-related queries with a regular expression like <em>google ?(\+|plus)|face ?book|twitter|social net</em> and exclude branded queries from the search.</p>
<ul>
<li>Note that <em><strong>+</strong></em> is a special character, so I had to escape it with a <em><strong>\</strong></em>.</li>
</ul>
<p>Of course, I would track <em>\bnemet\b</em>, which resulted in 25 visits this year, half of which bounced.</p>
<h2>Other Useful Patterns</h2>
<p>These are a few regular expression patterns that I use for every site or certain types of sites.</p>
<p><strong>Long unbranded tail</strong></p>
<p>The &#8220;long unbranded tail,&#8221; which I define as queries containing three or more terms, excluding branded terms, is always important to track. I have seen sites for which this accounts for over half of organic traffic.</p>
<p>There are several ways to write this regular expression, but <em>.+\b.+\b.+\b.+</em> is the way I do it. <em><strong></strong></em></p>
<ul>
<li><em><strong>+</strong></em> means &#8220;one or more of any character&#8221; and <em>\b</em> means &#8220;word break.&#8221;</li>
</ul>
<p>The entire expression could be interpreted as &#8220;at least three word breaks inside the query string.&#8221;</p>
<p>Because the query [search engine land] makes up most of the three word queries, excluding the branded pattern is important:</p>
<p style="text-align: center;"><img class="size-full wp-image-91745 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/09/long-tail-unbranded.png" alt="" width="622" height="92" /></p>
<p>Unbranded queries with three or more terms make up almost 70% of the organic traffic to Search Engine Land. Search features like Google Instant and autocomplete have definitely increased the average number of words per query.</p>
<h2>Queries From Google Finance</h2>
<p>The Google Finance page for a particular stock, like Yahoo, has a URL like this: <em>http://www.google.com/finance?client=ob&amp;q=NASDAQ:YHOO</em>.</p>
<p>Traffic from Google.com with &#8220;<em>q=</em>&#8221; in the URL will get treated as query traffic by Google Analytics.</p>
<p>A search using the regex <em>(nasdaq|nyse|amex):[a-z]{1,4}</em> will match these queries. <em>[a-z]</em> means &#8220;any character from a to z&#8221; and <em>{1,4}</em> means &#8220;repeated one, two, three, or four times.&#8221;</p>
<p>This doesn&#8217;t include the traffic from Google Finance for arbitrary queries, of course. And depending on what types of stocks your site covers, you may need to include more indexes like <em>ftse</em>.</p>
<p>To get a more accurate sense of traffic from Google Finance, be sure to include the referring traffic from www.google.com/finance/&#8230;</p>
<h2>Addresses</h2>
<p>Sometimes it isn&#8217;t possible to list out all of the possible query keywords. In that case, the best you can do is write a regular expression that captures enough of the queries to get meaningful data for trending, even if the absolute numbers aren&#8217;t so reliable.</p>
<p>For example, it&#8217;s not possible to list every possible street address. But limiting the regex to typical elements in a street address does a surprisingly good job.</p>
<p>I generally use <em>\b(road|\rd|drive|dr|lane|way|ave|avenue|st|street)\b</em>, which probably matches about 80% of the queries for a specific address.</p>
<p>It would further improve the accuracy to exclude branded terms or exclude another regex like:</p>
<p style="text-align: center;"> <em>sale|estate|pending</em></p>
<p>Another thing to try is putting a number in front of it like this:</p>
<p style="text-align: center;"> <em>[0-9].*\b(road|\rd|drive|dr|way|ave|avenue|st|street)\b</em></p>
<ul>
<li>The <em><strong>.*</strong></em> means &#8220;match any number (including zero) of any character,&#8221; so there could be any number or type of characters between the number and the rest of the regex.</li>
</ul>
<p>The need to match queries containing a state abbreviation is pretty common. This regex assumes that only the two letter abbreviations are being used and that they appear at the end of the query:</p>
<p style="text-align: center;"><em>\b(a[klrz]|c[aot]|d[ce]|fl|ga|hi|i[adln]|k[sy]|la|m[adeinost]|n[ehjmv]|n[cdy]|o[hkr]|pa|ri|s[cd]|t[nx]|ut|v[at]|w[aivy])$</em></p>
<p>It gets a few false positive matches (like &#8220;LA&#8221; meaning Los Angeles versus Louisiana or &#8220;CT&#8221; meaning court instead of Connecticut), but it brings back enough meaningful data for tracking metrics on these types of queries.</p>
<h2>Other Resources</h2>
<p>For testing or debugging regular expressions I generally use this <a href="http://www.apple.com/downloads/dashboard/developer/regexwidget.html">handy dashboard widget</a> (for Mac) or the Python interactive shell. There are many <a href="http://www.bing.com/search?q=regular+expression+tester">regular expression testers on-line</a> and even <a href="https://chrome.google.com/webstore/search?q=regular+expressions">Chrome extensions</a> and <a href="https://addons.mozilla.org/en-US/firefox/search/?q=regular+expressions&amp;cat=all">Firefox add-ons</a>.</p>
<p>I hope this post gave you some ideas for grouping and tracking keywords. If you have interesting regular expressions that you commonly use and want to share, please feel free to include them in the comments below.</p>
<p>* This is obviously a joke. My friends would want money before giving me access to someone&#8217;s Google Analytics account. ;)</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/tricks-for-taming-keywords-with-regular-expressions-91584/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>How To Improve Crawl Efficiency With Cache Control Headers</title>
		<link>http://searchengineland.com/how-to-improve-crawl-efficiency-with-cache-control-headers-88824</link>
		<comments>http://searchengineland.com/how-to-improve-crawl-efficiency-with-cache-control-headers-88824#comments</comments>
		<pubDate>Thu, 11 Aug 2011 15:45:46 +0000</pubDate>
		<dc:creator>Todd Nemet</dc:creator>
				<category><![CDATA[Advanced]]></category>
		<category><![CDATA[All Things SEO]]></category>
		<category><![CDATA[Channel: SEO]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[How To: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=88824</guid>
		<description><![CDATA[Way back at the end of the last century, I worked for a company called Inktomi. Most people remember Inktomi as a search engine, but it had several other divisions. One of these divisions (the one I worked for) sold networking software, including a proxy-cache called Traffic Server. It seems weird now, but Inktomi made [...]]]></description>
				<content:encoded><![CDATA[<p>Way back at the end of the last century, I worked for a company called Inktomi. Most people remember Inktomi as a search engine, but it had several other divisions. One of these divisions (the one I worked for) sold networking software, including a proxy-cache called Traffic Server.</p>
<p>It seems weird now, but Inktomi made more money from Traffic Server than it did from the search engine. Such were the economics of the pre-Google Internet. It was a great business until 1) bandwidth got really, really cheap and 2) almost all of the customers went out of business in late 2000/early 2001. (Most of Inktomi was acquired by Yahoo! in 2002, and <a href="http://trafficserver.apache.org/">Traffic Server</a> was released as an open source project in 2009.)</p>
<p>Because of my work with proxy caches, I&#8217;m always surprised when I do a technical review of a site and find that it has been configured not to be cached. When optimizing a website for crawling, it&#8217;s helpful to think of a search engine crawler as a web proxy cache that is trying to prefetch the website.</p>
<p>One quick note: When I talk about a &#8220;cached&#8221; page, I&#8217;m not referring to the &#8220;Cached&#8221; link in Google or Bing. I&#8217;m referring to a temporarily stored version of a page in a search engine, proxy-cache, or web browser.</p>
<p>As an example of a typical cache-unfriendly website, here are the HTTP response headers from my site, which is running my ISP&#8217;s default Apache install and WordPress more or less out of the box:</p>
<p style="text-align: center;"><a href="http://searchengineland.com/figz/wp-content/seloads/2011/08/toddnemet-com-headers1-600x261.png"><img class="size-large wp-image-88829 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/08/toddnemet-com-headers1-600x261.png" alt="HTTP response headers for toddnemet.com" width="600" height="261" /></a></p>
<p>The three lines circled in red are HTTP-ese for &#8220;Don&#8217;t cache this ever, under any circumstances.&#8221;</p>
<p>A little more detail about these headers:</p>
<ol>
<li><strong>Expires:</strong> indicates how long a proxy-cache or browser can consider a document &#8220;fresh&#8221; and not have to go back and get it. By setting this to a date two decades ago, the server is indicating that it should never be considered fresh.</li>
<li><strong>Cache-control:</strong> is used to explicitly tell proxy-caches or browsers information about the cacheability of the document. &#8220;no-store&#8221; and &#8220;no-cache&#8221; tell it not to cache the document. &#8220;must-revalidate&#8221; means that the cache should never serve the document without checking with the server first. &#8220;post-check&#8221; and &#8220;pre-check&#8221; are IE-specific settings that tell IE to always retreive the document from the server.</li>
<li><strong>Pragma:</strong> is an HTTP <em>request</em> header, so it has no meaning in this instance.</li>
</ol>
<h2>Cache Control Headers &amp; Technical SEO</h2>
<p>So what do cache control headers have to do with technical SEO? They matter in two ways:</p>
<ol>
<li>They help search engines crawl sites more efficiently (because they don&#8217;t have to download the same content over and over unnecessarily).</li>
<li>They increase the page speed and improve user experience for most visitors to your site. It can even potentially improve the experience for first-time visitors.</li>
</ol>
<p>In other words, by adding a few lines to your Web server configuration to support caching, it&#8217;s possible to have more of your site crawled by search engines while also speeding up your site for users.</p>
<p>Let&#8217;s look at crawl efficiency first.</p>
<h2>Crawl Efficiency</h2>
<p>Only two pairs of cache control headers matter for search engine crawling. These types of requests are called &#8220;conditional GETs&#8221; because the response to a GET will be different depending on whether the page has changed or not.</p>
<p><a href="http://searchengineland.com/">Searchengineland.com</a> happens to support both methods, so I will be using it in the examples below.</p>
<h2>Last-Modified/If-Modified-Since</h2>
<p>This is the most common and widely-supported conditional GET. It is supported by both Google&#8217;s and Bing&#8217;s crawlers (and all browsers and proxy caches that I&#8217;m aware of).</p>
<p>It works like this. The first time a document is requested a Last-Modified: HTTP header is returned indicating the date that it was modified.</p>
<p style="text-align: center;"><img class="size-full wp-image-88831 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/08/searchengineland-com-headers-last-modified.png" alt="HTTP response headers for searchengineland.com showing the Last-Modified header date." width="431" height="312" /></p>
<p>The next time the document is requested, Googlebot or Bingbot will add a If-Modified-Since: header to the request that contains the Last-Modified date that it received. (In the examples below, I&#8217;m using curl and the -H option to send these HTTP headers.)</p>
<p>If the document hasn&#8217;t been modified since the If-Modified-Since date, then the server will return a 304 Page Not Modified response code and no document. The client, whether it is Googlebot, Bingbot, or a browser, will use the version that it requested previously.</p>
<p style="text-align: center;"><img class="size-large wp-image-88833 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/08/sel-ims-304-600x170.png" alt="HTTP response headers for an If-Modified-Since request returning a 304" width="600" height="170" /></p>
<p>If the document has been modified since the If-Modified-Since date, then the server returns a 200 OK response along with the document as if it were responding to a request without an If-Modified-Since header.</p>
<p style="text-align: center;"><img class="size-large wp-image-88832 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/08/sel-ims-200-600x242.png" alt="HTTP response headers for searchengineland.com showing a 200 response" width="600" height="242" /></p>
<h2>ETag/If-None-Match</h2>
<p>If-None-Match requests work in a similar way. The first time a document is requested, an Etag: header is returned. The ETag is generally a hash of several file attributes.</p>
<p style="text-align: center;"><img class="size-full wp-image-88830 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/08/searchengineland-com-headers-etag.png" alt="HTTP response headers from searchengineland.com with the ETag header highlighted" width="426" height="308" /></p>
<p>The second request includes an If-None-Match: header containing that ETag value. If this value matches the ETag that would have been returned, the server returns a 304 Page Not Modified header.</p>
<p style="text-align: center;"><img class="size-large wp-image-88835 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/08/sel-inm-304-600x179.png" alt="HTTP response headers from searchengineland.com showing a 304 response to an If-None-Match request" width="600" height="179" /></p>
<p>If the ETag doesn&#8217;t match, then a normal 200 OK response is returned.</p>
<p style="text-align: center;"><img class="size-large wp-image-88834 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/08/sel-inm-200-600x256.png" alt="HTTP response headers for searchengineland.com showing a 200 response to an If-None-Match request" width="600" height="256" /></p>
<p>ETag/If-None-Match is definitely supported by Bing, but it&#8217;s unclear whether Google supports it. Based on the analysis of log files that I have done, I&#8217;m pretty sure that Googlebot web requests don&#8217;t support it. (It&#8217;s possible that other Google crawlers support it, though. I&#8217;m still researching this, and I&#8217;ll post a follow up article if/when I get more information.)</p>
<p>One common problem with ETag/If-None-Match support pops up with websites that load-balance between different back end servers. Many times, the ETag is generated from something that varies from server to server, such as the file&#8217;s inode, which means that the ETag will be different for each back end server.</p>
<p>This greatly reduces the cacheability of load-balanced websites because the odds of requesting the same document from the same server decreases in proportion to the number of back end servers.</p>
<p>In general, I recommend implementing Last-Modified/If-Modified-Since instead of ETag/If-None-Match because it is supported more widely and has fewer problems associated with it.</p>
<h2>When To Use These Conditional GETs</h2>
<p>Conditional GETs should be implemented on any static Web resources, including HTML pages, XML sitemaps, image files, external JavaScript files, and external CSS files.</p>
<ul>
<li>For Apache, the mod_cache module should be installed and configured. If the server still isn&#8217;t supporting conditional GETs check for a CacheDisable line in the httpd.conf or a .htaccess file somewhere.</li>
</ul>
<ul>
<li>For IIS7, caching is controlled by the <a href="http://www.iis.net/ConfigReference/system.webServer/caching">&lt;caching&gt; element in the site configuration file</a>. I&#8217;m not sure how to enable it in IIS6, though it appears to be enabled by default.</li>
</ul>
<p>For dynamic, programmatically generated files, the HTTP headers associated with conditional GETs need to be sent from the page code. You need to do some back of the envelope calculations on two factors to determine if this is worth it.</p>
<ol>
<li>Does it take as many resources (for example, calls to back-end databases) to determine whether the page has changed versus generating the file itself?</li>
<li>Does the page change often compared to how often the page is crawled by search engines?</li>
</ol>
<p>If the answer to both questions is yes, then it may not be worth implementing support for conditional GETs in your code for dynamic pages.</p>
<h2>Page Speed</h2>
<p>I also recommend setting expiry times for static resources that don&#8217;t change often, such as images, JavaScript files, CSS files, etc.</p>
<p>This allows browsers to store these resources and reuse them on other pages on your site without having to unnecessarily download them from the Web server.</p>
<p>Also, it is likely that these resources will get stored in a proxy cache somewhere in the Internet where it will be served more quickly to other users, even on their first visit.</p>
<p>There are two ways to set an expiry time using HTTP cache control headers.</p>
<ol>
<li>Expires: &lt;date&gt;, which indicates the date before which a resource can be stored.</li>
<li>Cache-control: max-age=&lt;seconds&gt;, which indicates the number of seconds that a resource can be stored.</li>
</ol>
<p>The expiry time can be set up to a maximum of one year, according to the HTTP spec. I recommend setting it at a minimum of several months.</p>
<h2>Configuring Expiry Time</h2>
<p>For Apache, it requires installing the <a href="http://httpd.apache.org/docs/current/mod/mod_expires.html">mod_expires</a> tag and creating some ExpiresDefault or ExpiresByType lines. Cache-control also requires <a href="http://httpd.apache.org/docs/current/mod/mod_headers.html">mod_headers</a>.</p>
<p>IIS7 can be configured through IIS Manager or some command line tools. See <a href="http://technet.microsoft.com/en-us/library/cc770661(WS.10).aspx">this link</a> for more details.</p>
<p>For resources that are generated dynamically, these headers can be added programmatically like any other header. Just make sure that the Expires: <a href="http://www.csgnetwork.com/timerfc1123calc.html">date is in the right format</a> or it likely will be ignored.</p>
<p><span style="font-size: 20px; font-weight: bold;">Other Resources</span></p>
<p>Below are some additional resources relate to caching, since this article only scratches the surface of the HTTP cache control protocol. I recommend checking out the links below to learn more about it.</p>
<h3>Testing cache control headers</h3>
<ul>
<li><a href="http://redbot.org/">Redbot.org</a>, written by &#8220;<a href="http://www.mnot.net/">mnot</a>&#8220;, is the best cache-checking tool I am aware of. I use it all the time when assessing sites.</li>
<li>Microsoft has a <a href="http://www.microsoft.com/search/tools/">very useful tool</a> for looking at headers that is available here.</li>
</ul>
<p>I&#8217;m also a big fan of using <a href="http://curl.haxx.se/">curl</a> -I from the command line to look at headers directory.</p>
<h3>Advanced reading</h3>
<ul>
<li>Google&#8217;s <a href="http://code.google.com/speed/page-speed/docs/caching.html">page speed article</a> on leveraging caching.</li>
<li>Yahoo&#8217;s <a href="http://developer.yahoo.com/performance/rules.html">best practices article</a> for speeding up a web site contains some information about caching (click on the &#8220;Server&#8221; category):[[[]]]</li>
<li>Bing outlines their support for conditional GETs and includes some helpful links <a href="http://www.bing.com/community/site_blogs/b/webmaster/archive/2008/02/12/announcing-crawling-improvements-for-live-search.aspx">here</a>.</li>
<li>Mnot has an excellent, thought slightly dated, <a href="http://www.mnot.net/cache_docs/">overview of caching</a> that is very useful.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/how-to-improve-crawl-efficiency-with-cache-control-headers-88824/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>4 Ideas To Improve IIS &amp; .NET For Technical SEO</title>
		<link>http://searchengineland.com/4-ideas-to-improve-iis-net-for-technical-seo-84712</link>
		<comments>http://searchengineland.com/4-ideas-to-improve-iis-net-for-technical-seo-84712#comments</comments>
		<pubDate>Thu, 14 Jul 2011 13:37:55 +0000</pubDate>
		<dc:creator>Todd Nemet</dc:creator>
				<category><![CDATA[All Things SEO]]></category>
		<category><![CDATA[Channel: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=84712</guid>
		<description><![CDATA[In June 2011, I spoke at SMX Advanced about SEO issues that I commonly run in to during technical SEO site evaluations. The part of my presentation that dealt with Microsoft&#8217;s Internet Information Server (IIS) generated a lot of comments and questions afterward, so this column addresses some of those questions about how to improve [...]]]></description>
				<content:encoded><![CDATA[<p>In June 2011, I spoke at SMX Advanced about SEO issues that I commonly run in to during technical SEO site evaluations. The part of my presentation that dealt with Microsoft&#8217;s Internet Information Server (IIS) generated a lot of comments and questions afterward, so this column addresses some of those questions about how to improve techncial SEO on the Microsoft stack.</p>
<p>First, a caveat: The majority of my experience has been with Linux- and BSD-based operating systems, starting with SunOS way back at Berkeley, so I&#8217;m definitely not an expert on deploying servers on Windows and/or .NET.</p>
<p>I&#8217;ve asked Microsoft-stack expert <a href="http://www.colincochrane.com/page/Colin-Cochrane-About.aspx">Colin Cochrane</a> to correct anything Windows-related that I have stated incorrectly. (Thank you, Colin. Your link is in the mail.) Any remaining errors in this article are definitely mine, and not his.</p>
<p>After completing technical SEO assessments on numerous sites running on IIS and .NET, I believe that it is a very scalable and production-worthy platform, but I have found that its default settings are far from optimal from a technical SEO point of view.</p>
<p>This article describes the most common issues I&#8217;ve seen. Several of these issues cause canonicalization problems, as <a href="http://searchengineland.com/google-lets-you-tell-them-which-url-parameters-to-ignore-25925">described in more detail in this article</a> about Google&#8217;s parameter handling feature.</p>
<p>Oh, and here is a second caveat: Please be sure to test any changes on a staging server <em>before rolling them ou</em>t to production. I would hate for something to happen to your website because I made a typo or worded something unclearly.</p>
<h2>1.  Default Pages (Default.aspx)</h2>
<p><strong>The problem</strong></p>
<div>Directory pages are available at two URLs, one with and one without the default page. For example, these two URLs would lead to the same page:</div>
<div>
<ul>
<li>http://www.site.com/directory/</li>
<li>http://www.site.com/directory/Default.aspx</li>
</ul>
</div>
<div>In this example, the default page is Default.aspx, though it could be configured to be a different name.</div>
<p><strong>Why it is bad</strong></p>
<ul>
<li><strong>Link diffusion.</strong> Inbound links to the page could point at either of these two URLs. It would be much better to focus the inbound links on only one URL.</li>
<li><strong>Crawl inefficiency. </strong>Crawlers have to crawl two URLs to get one page for each directory on the site.</li>
</ul>
<p>The usual way to deal with duplicate URLs like these is to permanently (with a 301) redirect one URL to the other. However, in this case, it will result in an infinite redirect loop.</p>
<p><strong>The culprit</strong></p>
<div>The reason that redirecting one URL to the other leads to a redirect loop is because both of these URLs look exactly the same to the .NET application. For directory URLs, the default page is always appended to it so the application can&#8217;t tell whether it should redirect the URL or not.</div>
<p><strong>Fixing it</strong></p>
<p>The easiest way to fix this is to put a link rel=canonical tag on these pages and point to whichever URL you want to be the canonical. It&#8217;s not as good as a permanent redirect, but it will work in a pinch if you don&#8217;t want to mess around with your server configuration.</p>
<p>A more permanent fix is to use a 3rd party URL rewriter, which will redirect the URL before it gets to the .NET application. Some URL rewriters I have seen used successfully on sites are <a href="http://www.iis.net/download/urlrewrite">URLRewrite</a> (for IIS7 only), <a href="http://URLRewriter.net/">URLRewriter</a>, and <a href="http://www.isapirewrite.com/">ISAPI Rewrite 2</a>.</p>
<h2>2.  Case Insensitive URLs</h2>
<p><strong>The problem</strong></p>
<p>The path part of the URLs served by IIS is case-insensitive. So any of these URLs will usually lead to the same page:</p>
<div>
<ul>
<li>http://www.site.com/directory/default.aspx</li>
<li>http://www.site.com/Directory/Default.ASPX</li>
<li>http://www.site.com/DIRECTORY/DeFaUlT.aSpX</li>
</ul>
</div>
<p><strong>Why it is bad</strong></p>
<ul>
<li><strong>Crawl inefficiency. </strong>Google and Bing will crawl all of the different case variations that it sees in links, even though they all lead to the same page.</li>
<li><strong>Link diffusion. </strong>Inbound links could go to any of the variations of the same URL. I&#8217;ve even seen different capitalizations of URLs used in internal links within a website.</li>
<li><strong>Robots.txt problems. </strong>Because the robots.txt file is case-sensitive, if your URLs aren&#8217;t crawlers may be accessing URLs that you thought were blocked.</li>
</ul>
<p><strong>The culprit</strong></p>
<div>My guess is that it has something to do with the Windows path handling in general, which is also case-insensitive.</div>
<p><strong>Some ideas for fixing it</strong></p>
<p>Similar to the first issue, the easiest way to resolve this is to use a link rel=canonical tag that points to the URL with the correct capitalization.</p>
<p>The URL rewriters listed above are the best option for normalizing the case. They can be configured to permanently redirect a URL to the right capitaliziation. If you pick an easy method for canonicalizing URLs, like converting everything to lower case, it can be implemented with one general rule.</p>
<p>Here is an example rule that rewrites a URL to all lower case that will work with URLRewrite:</p>
<blockquote>
<pre>&lt;rule name="LowerCaseRule"&gt;</pre>
<pre>  &lt;match url="[A-Z]" ignoreCase="false" /&gt;</pre>
<pre>  &lt;action type="Redirect" url="{ToLower:{URL}}" appendQueryString="true" /&gt;</pre>
<pre>&lt;/rule&gt;</pre>
</blockquote>
<p>If you implement something like this keep in mind that some URLs may require upper case, such as the Bing authorization file <em>BingSiteAuth.xml</em>. URLs like these need to be added to the rule as exceptions.</p>
<div>Here is a <a href="http://www.cto20.com/post/Tips-Tricks-3-URL-Rewriting-Rules-Everyone-Should-Use.aspx">post containing 10 very useful rewriting rules</a>, one of which converts URLs to lowercase.</div>
<h2>3.  Handling Page Not Found Errors &amp; Internal Server Errors</h2>
<p><strong>The problem</strong></p>
<p>In its default configuration, <a href="http://asp.net/" target="_blank">ASP.NET</a> handles errors (like page not found or internal server problems) by redirecting with a 302 temporary redirect to an error page, which usually returns a 200 response.</p>
<p><strong>Why it&#8217;s bad</strong></p>
<ul>
<li><strong>Crawl inefficiency. </strong>Because a 302 redirect is a temporary redirect, search engines will continue to check that URL often in hopes of one day getting a page at that URL instead of a redirect. And if the target page returns a 200 response, then the search engines will index the initial URL, which means your site might start ranking with URLs that lead searchers to error pages.</li>
</ul>
<p>This means that pages that are removed from the site or pages that throw an error will get continue to be crawled as if they were regular pages. This means that the crawler is spending time on these URLs instead of on actual pages with useful content.</p>
<p>And because the page not found page gets so much traffic and has so many URLs pointing to it, they tend to get crawled pretty frequently, which further reduces crawl efficiency.</p>
<ul>
<li><strong>&#8220;Non-graceful&#8221; site failure.</strong> If your site starts returning an error &#8212; due to a temporary database problem, for example &#8212; large portions of your site could get de-duplicated out of the index because they are suddenly redirecting to the same URL.</li>
</ul>
<p><strong>The culprit</strong></p>
<p>This is the default behavior in <a href="http://asp.net/" target="_blank">ASP.NET</a>.</p>
<p><strong>Some ideas for fixing it</strong></p>
<p>Fortunately, this issue has a fix that is pretty straight forward and requires a minor change to the web.config file.</p>
<p>Here is part of an example web.config file that prevents these redirects:</p>
<blockquote>
<pre>&lt;customErrors mode="RemoteOnly" defaultRedirect="GeneralErrorPage.aspx" redirectMode="ResponseRewrite"&gt;</pre>
<pre>  &lt;error statusCode="404" redirect="404ErrorPage.aspx" /&gt;</pre>
<pre>&lt;/customErrors&gt;</pre>
</blockquote>
<div>The attribute <em>redirectMode</em> needs to be set to <em>ResponseRewrite</em> instead of its default value of <em>ResponseRedirect</em>.</div>
<p><em>redirectMode</em> is not available in all versions of .NET, so you may need to update first. More detail can be found in <a href="http://msdn.microsoft.com/en-us/library/h0hfz6fc.aspx">this article</a>.</p>
<h2>4. Browser-dependent code</h2>
<p><strong>The problem</strong></p>
<p>.NET has some hooks that makes it pretty easy to write code that changes a page depending on the user agent requesting it.</p>
<p><strong>Why it&#8217;s bad</strong></p>
<ul>
<li><strong>Cloaking.</strong> Pages that change based on the user agent (i.e. Googlebot or Firefox) is dangerous for a lot of reasons, but from an SEO perspective it is dangerous because it could lead to unintentional cloaking of content, which can result in having a severe penalty put on your site.</li>
</ul>
<div>By default, there is nothing user agent-dependent about the code that is served by IIS/.NET. But because the functionality is there, it is possible that browser-dependent code exists in your site.</div>
<p><strong>The culprit</strong></p>
<p>I believe this functionality dates back to the late 1990&#8242;s/early 2000&#8242;s when browsers had widely different support for web standards. If you are feeling nostalgic for those days, here is an <a href="http://www.richinstyle.com/bugs/table.html">old browser compatability chart</a> that you can look at until the feeling goes away.</p>
<p><strong>Some ideas for fixing it</strong></p>
<p>Chances are there is nothing to fix, but if you want to look at your source code for potential browser-dependent logic, here is an <a href="http://support.microsoft.com/kb/311281">article with sample code</a> that should give you an idea of what to look for.</p>
<h2>Conclusion</h2>
<p>I hope this article helps you make your IIS installation more search engine-friendly. I have spoken with some very smart Windows developers who initially swore to me that there was no fix for some of the issues in this list, so there is a pretty good chance that your development team isn&#8217;t aware of all of these issues or even that these fixes exist.</p>
<p>Of course, these are only a few of the issues that I see with IIS on a regular basis. Others include cacheability of the site, character encoding issues, and URL redirects.</p>
<p>The easiest way to pinpoint these types of issues is by looking at your server logs.</p>
<p>(Blatant Product Placement/Disclaimer: It just so happens that at Nine By Blue, where I work, I created server log analysis software for just this purpose when I got tired of looking for all of these issues manually, so if you&#8217;re interested in that product for either your IIS or Apache logs, ask me about an invite to our private alpha.)</p>
<div>I guess the real lesson of this article is that IIS and .NET are a great help to SEO job security.</div>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/4-ideas-to-improve-iis-net-for-technical-seo-84712/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>DIY SEO: How To Check On-Page Ranking Factors Using Google Docs</title>
		<link>http://searchengineland.com/diy-seo-how-to-check-on-page-ranking-factors-using-google-docs-81036</link>
		<comments>http://searchengineland.com/diy-seo-how-to-check-on-page-ranking-factors-using-google-docs-81036#comments</comments>
		<pubDate>Thu, 16 Jun 2011 15:50:22 +0000</pubDate>
		<dc:creator>Todd Nemet</dc:creator>
				<category><![CDATA[All Things SEO]]></category>
		<category><![CDATA[Channel: SEO]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[How To: SEO]]></category>
		<category><![CDATA[Intermediate]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=81036</guid>
		<description><![CDATA[My kids and I really enjoy watching the MAKE Magazine video podcasts together. It&#8217;s one of those rare and happy things that a ten-year-old girl, an eight-year-old boy, and an adult can watch together and find interesting. Inspired by these podcasts, I thought it would be a good idea to create a do-it-yourself SEO project. [...]]]></description>
				<content:encoded><![CDATA[<p>My kids and I really enjoy watching the <a href="http://blog.makezine.com/video/" target="_blank">MAKE Magazine video podcasts</a> together. It&#8217;s one of those rare and happy things that a ten-year-old girl, an eight-year-old boy, and an adult can watch together and find interesting.</p>
<p>Inspired by these podcasts, I thought it would be a good idea to create a do-it-yourself SEO project. So today, we&#8217;ll make a Google Spreadsheet that checks a web page for various on-page factors that can affect SEO.</p>
<p><img class="size-medium wp-image-81246 alignright" style="margin: 8px;" src="http://searchengineland.com/figz/wp-content/seloads/2011/06/on-page-checking-google-docs.png" alt="" width="300" height="190" /></p>
<h2>Getting Started</h2>
<p>What you need:</p>
<ul>
<li>A Google Account for logging into Google Spreadsheets</li>
<li>A URL that you want to check.</li>
</ul>
<p>In this article, I&#8217;ll be checking <a href="http://searchengineland.com/" target="_blank">http://searchengineland.com/</a>. The spreadsheet that we will create in this article is <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0AsPnqy4kWjXkdG5oNFMwNUZiVWM0bzVfQlpNSGJJM3c&amp;hl=en_US&amp;authkey=COD_94UL" target="_blank">here</a>.</p>
<p>Once you are signed in to Google Spreadsheets, you will be able to make your own copy to work with by opening the spreadsheet and selecting <em>File -&gt; Make A Copy&#8230;</em></p>
<p>If you would rather start with a blank spreadsheet and fill it in as you go through this article, select <em>File -&gt; New -&gt; Spreadsheet</em>.</p>
<h2>How It Works</h2>
<p>Our on-page checking spreadsheet uses the importXML() function in Google Spreadsheets. This very useful function takes two arguments, a URL to a document to be parsed and an Xpath query that tells it which information to import into the spreadsheet.</p>
<p>More information about the importXML() function can be found in <a href="https://docs.google.com/support/bin/answer.py?answer=75507" target="_blank">Google&#8217;s documentation</a>.</p>
<p>Xpath is a query language that is used to match elements (better known to those of us who are more familiar with HTML as &#8220;tags,&#8221; as in &#8220;title tag&#8221; or &#8220;H1 tag.&#8221;) and the attributes of these elements (for example, &#8220;alt<em>&#8220;</em> or &#8220;href<em>&#8220;</em>) in an XML document and to tell it what information to extract.</p>
<p>For example, the Xpath query &#8220;//a[@href="index.htm"]/text()&#8221; will return the anchor text for any link pointing to the file index.htm. Don&#8217;t worry if this doesn&#8217;t make any sense yet. As you work with a few examples, it will become clearer.</p>
<p>A good resource for Xpath queries can be found <a href="http://www.w3schools.com/xpath/" target="_blank">here</a>.</p>
<h2>Testing The Basics</h2>
<p>Let&#8217;s get started. First we will do a simple query to extract the title from an HTML document. To do this, follow these steps:</p>
<ul>
<li>Enter the URL you are checking in the cell A1.</li>
<li>Put &#8220;Title&#8221; in cell A2.</li>
<li>In cell B2 enter this exact text: =importXML(A1, &#8220;(//title|//TITLE)&#8221;)</li>
</ul>
<p style="text-align: center;"><a href="http://searchengineland.com/figz/wp-content/seloads/2011/06/getting-started-on-page-checker.png"><img class="size-large wp-image-81248 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/06/getting-started-on-page-checker-600x113.png" alt="" width="600" height="113" /></a></p>
<p>The parts that are &#8220;//title&#8221; and &#8220;//TITLE&#8221; will match all elements (tags) that are either &#8220;title&#8221; or &#8220;TITLE.&#8221; (Xpath queries are case-sensitive by default, so we are matching all uppercase or all lowercase.) The parentheses and vertical bar &#8220;|&#8221; tell Xpath to return elements that match either of the two.</p>
<p>Once you hit return, the text should change to the title of the page you are checking. You may see &#8220;Loading&#8230;&#8221; for a few seconds while Google retrieves and parses the page.</p>
<p style="text-align: center;"><a href="http://searchengineland.com/figz/wp-content/seloads/2011/06/getting-started-title-changed.png"><img class="size-large wp-image-81249 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/06/getting-started-title-changed-600x109.png" alt="" width="600" height="109" /></a></p>
<p>If something went wrong, check the following things:</p>
<ul>
<li>Does the page at your URL have a title tag?</li>
<li>Does the URL redirect anywhere?</li>
<li>Is the title tag written as &#8220;Title&#8221; in the HTML? Remember that Xpath queries are case sensitive, so the query above will only match &#8220;title&#8221; and &#8220;TITLE.&#8221;</li>
<li>Did you type the URL correctly?</li>
</ul>
<p>It&#8217;s also possible that the HTML in the page you are checking is too badly formed to be correctly parsed by the importXML() function. In this case, either pick a new URL or <a href="http://searchengineland.com/" target="_blank">validate</a> and <a href="http://infohound.net/tidy/" target="_blank">tidy</a> the page&#8217;s HMTL and try again.</p>
<h2>Checking Header Tags</h2>
<p>If everything is working up to this point, we are now ready to run more queries against our pages.</p>
<p>Let&#8217;s check for the header tags <em>H1</em> and <em>H2</em>.</p>
<p>Follow these steps:</p>
<ul>
<li>Put &#8220;H1&#8243; in cell A4</li>
<li>In cell B4 enter this text: =importXML(A1, &#8220;(//h1|//H1)&#8221;)</li>
<li>Put &#8220;H2&#8243; in cell A10</li>
<li>In cell B10 enter this text: =importXML(A1, &#8220;//h2|//H2)&#8221;)</li>
</ul>
<p>At this point, you should see the text of the H1 and H2 tags of your page. Notice how a cell is filled out for each matching tag. It&#8217;s important to leave enough room for additional cells so that you can see all matching values.</p>
<h2>Creating Alerts &amp; Testing The Results</h2>
<p>Another useful thing we can do with Google Spreadsheets is write tests that check the output of importXML() and flag any problems or deviations from best practices.</p>
<p>In this <a href="http://www.youtube.com/watch?v=GIn5qJKU8VM" target="_blank">webmaster help video</a>, Matt Cutts says that more than one H1 is okay for some pages, but he also recommends not to over do it. So let&#8217;s write two alerts, one to make sure there is at least one H1 tag and another one to alert us if there is more than one H1 tag. Follow these steps:</p>
<ul>
<li>In cell C4 enter this text: =IF(ISERR(B4),&#8221;No H1 tag found!&#8221;,&#8221;OK&#8221;)</li>
<li>In cell C5 enter this text: =IF(COUNTA(importXML(A1,&#8221;(//H1|//h1)&#8221;))&gt;1,&#8221;Multiple H1 tags found!&#8221;,&#8221;OK&#8221;)</li>
</ul>
<p>The ISERR() function will check for an error in a cell, including &#8220;#N/A&#8221; which is the result of an Xpath query that doesn&#8217;t match anything.</p>
<p>The COUNTA() function counts the number of elements in an array, which is what is returned by importXML(). This is the most efficient way to get the number of matches for a particular Xpath query.</p>
<p>If you want to make the alerts stand out more, use conditional formatting in column C to turn the alerts red if they don&#8217;t pass.</p>
<p>To do this, select column C, go to <em>Format &gt; Conditional formatting&#8230;</em> and set the text to red when the text contains an exclamation point.</p>
<p style="text-align: center;"><a href="http://searchengineland.com/figz/wp-content/seloads/2011/06/Screen-Shot-2011-06-10-at-1.18.38-AM.png"><img class="size-full wp-image-81044 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/06/Screen-Shot-2011-06-10-at-1.18.38-AM.png" alt="Conditional formatting" width="474" height="188" /></a></p>
<p><a rel="attachment wp-att-81044" href="http://searchengineland.com/diy-seo-how-to-check-on-page-ranking-factors-using-google-docs-81036/screen-shot-2011-06-10-at-1-18-38-am"></a><span style="font-size: 20px; font-weight: bold;">Extracting Attributes</span></p>
<p>Xpath queries are also useful for extracting the value of attributes within a tag, which means that we can check the usual SEO-related meta tags.</p>
<p>For example, let&#8217;s look for the link canonical tag and meta robots tags on the document. Follow these steps:</p>
<ul>
<li>Put &#8220;Robots meta&#8221; in cell A30</li>
<li>In cell B30 enter this text: =importXML(A1, &#8220;//meta[@name='robots']/@content&#8221;)</li>
<li>Put &#8220;Link canonical&#8221; in cell A31</li>
<li>In cell B31, enter this text: =importXML(A1, &#8220;//link[@rel='canonical']/@href&#8221;)</li>
</ul>
<p>The &#8220;[@foo="bar"]&#8220; syntax that we have added is a way of restricting the matching tags to only elements containing that attribute-value pair. The /@content and /@href in each Xpath query returns the values for those attributes.</p>
<p>Note that attribute and value matching is also case-sensitive. So if any of the elements, attributes, or values being matched contain an upper-case letter then our Xpath query won&#8217;t match it. You may need to adjust the Xpath queries to match the style of HTML that your CMS outputs.</p>
<p>You should now see the meta robots directives and link canonical values for the page you are checking. If you see &#8220;#N/A&#8221; in the cell after hitting return then the page doesn&#8217;t have these meta tags, you typed the Xpath query incorrectly, or there are case-sensitivity problems.</p>
<h2>Checking Links &amp; Anchor Text</h2>
<p>Let&#8217;s finish with some queries that count the number of links on the page and lists the anchor text and outbound links.</p>
<p>Because pages usually have many links, let&#8217;s do this on a new tab so we will have enough room for the output. Follow these steps:</p>
<ul>
<li>Go to Insert &gt; New Sheet to create a new tab for the spreadsheet.</li>
<li>In cell A1 enter the URL you are checking</li>
<li>In cell A2 enter the following: =COUNTA(importXML(A1,&#8221;//a&#8221;)) &amp; &#8221; links&#8221;</li>
<li>In cell A3 enter the following: =importXML(A1,&#8221;(//a/text()|//a/img/@alt)&#8221;)</li>
<li>In cell B3 enter the following: =importXML(A1,&#8221;//a/@href&#8221;)</li>
</ul>
<p>Remember that the parentheses and vertical bar (or &#8220;|&#8221;) in the Xpath query for cell A3 matches either one of the Xpath queries separated by a &#8220;|&#8221;. So in this example, we are returning any anchor text or alt text of an image within that link.</p>
<p>The ampersand (or &#8220;&amp;&#8221;) in the query for cell A2 combines text into one string.</p>
<p style="text-align: center;"><a href="http://searchengineland.com/figz/wp-content/seloads/2011/06/anchor-text-and-links.png"><img class="size-large wp-image-81253 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/06/anchor-text-and-links-600x220.png" alt="" width="600" height="220" /></a></p>
<p>If everything was entered correctly, you should see the number of links on the page in cell A2 with a list of all anchor text and image alt text listed below that. In column B, you should see a list of all the links on the page.</p>
<p>Ideally, the list of anchor text and links will match up. But it is possible that some of the links won&#8217;t have any anchor text and will be skipped. If the text and the links don&#8217;t match up, then it is very likely that not all links have consistent anchor text.</p>
<h2>Extra Credit</h2>
<p>If you want to continue exploring the use of Google Spreadsheets to check on page factors, I created another spreadsheet with more examples <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0AsPnqy4kWjXkdDNiSWRoa19EVlUxREZXcHhzbUxaQkE&amp;hl=en_US&amp;authkey=COOOjKgD">here</a>.</p>
<p>This spreadsheet contains a few more advanced examples that can check things like:</p>
<ul>
<li>The meta description tag</li>
<li>The <a href="http://www.google.com/safebrowsing/diagnostic?site=google.com">Safe browsing diagnostics page</a> for a domain</li>
<li>Whether or not the page is in Google&#8217;s index</li>
<li>Images and their alt text</li>
<li>Images that don&#8217;t contain alt text</li>
</ul>
<p>The Xpath queries are in the &#8220;Queries&#8221; tab, and you can double click on the cells to see the underlying formulas.</p>
<p>Make a copy for yourself and start exploring. Feel free to share any interesting Xpath queries or formulas you come up with in the comments. Happy hacking!</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/diy-seo-how-to-check-on-page-ranking-factors-using-google-docs-81036/feed</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Search Engines Don’t Like You? Don’t Jump To Conclusions</title>
		<link>http://searchengineland.com/search-engines-don%e2%80%99t-like-you-don%e2%80%99t-jump-to-conclusions-76868</link>
		<comments>http://searchengineland.com/search-engines-don%e2%80%99t-like-you-don%e2%80%99t-jump-to-conclusions-76868#comments</comments>
		<pubDate>Thu, 19 May 2011 17:15:21 +0000</pubDate>
		<dc:creator>Todd Nemet</dc:creator>
				<category><![CDATA[All Things SEO]]></category>
		<category><![CDATA[Channel: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=76868</guid>
		<description><![CDATA[One of the most frustrating things about technical problems with a site is that the ways they show up in search engines are usually unexpected or subtle. What looks like a penalty can actually be a problem introduced with a new version or new feature of a website. Because the true causes of problems like [...]]]></description>
				<content:encoded><![CDATA[<p>One of the most frustrating things about technical problems with a site is that the ways they show up in search engines are usually unexpected or subtle. What looks like a penalty can actually be a problem introduced with a new version or new feature of a website.</p>
<p style="text-align: center;"><a rel="attachment wp-att-77667" href="http://searchengineland.com/search-engines-don%e2%80%99t-like-you-don%e2%80%99t-jump-to-conclusions-76868/google-does-not-like-your-site"><img class="size-full wp-image-77667 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/05/google-does-not-like-your-site.png" alt="Google doesn't like your site" width="439" height="78" /></a></p>
<p>Because the true causes of problems like these are usually not at all obvious, they can lead to hypotheses that border on the paranoid (&#8220;Google doesn’t like my site,&#8221;) or wild speculation: (&#8220;I was put in the sandbox and then hit with Panda. I call it the Pandbox.&#8221;).</p>
<p>Since Google isn’t alive and doesn’t have emotions (yet), we can safely set aside (for now) any search engine anthropomorphizing and focus on finding root causes that may be lurking in the site’s technical infrastructure.</p>
<h2><strong>Symptoms: Fewer Pages In The Index, Drop In Long Tail Traffic</strong></h2>
<p>The main causes for problems with site coverage include duplicate content, allowing pages with no SEO value to be crawled, and network problems.</p>
<p>Duplicate content occurs when you can get to a page through multiple URLs.</p>
<p>Sometimes this is caused by having an entire copy of a site available on another subdomain, like <a href="http://www1.yoursite.com/">http://www1.yoursite.com/</a>, or on an IP address, like <a href="http://192.168.1.1/">http://192.168.1.1/</a>.</p>
<p>Duplicate content can also happen at the page level, when a page is available at multiple URLs like this:</p>
<ul>
<li><a href="http://www.yoursite.com/category">http://www.yoursite.com/category</a></li>
<li><a href="http://www.yoursite.com/category/">http://www.yoursite.com/category/</a></li>
<li><a href="http://www.yoursite.com/category/default.aspx">http://www.yoursite.com/category/default.aspx</a></li>
<li><a href="http://www.yoursite.com/category/Default.ASPX">http://www.yoursite.com/category/Default.ASPX</a></li>
<li><a href="http://www.yoursite.com/category/default.aspx?referral_id=1">http://www.yoursite.com/category/default.aspx?referral_id=1</a></li>
</ul>
<p>Both types of duplicate content reduce the number of pages in the index because search engines are wasting their time crawling multiple copies of a website or a page.</p>
<p>Search engines throw away these extra copies because there is no point in including redundant pages in the index. This means that time spent crawling more pages on your site was wasted crawling extra copies of pages that won’t be used anyway.</p>
<p>For the example pages above, that site would have to be crawled at least five times to get each page of the site.</p>
<p>If you have a duplicate site, you can use a 301 to permanently redirect any visitors to the main site.</p>
<p>Fixing duplicate content at the page level is a bit tricker.</p>
<p>Select one canonical URL from each set of potential duplicate URLs and make sure that each duplicate URL<a href="http://searchengineland.com/search-illustrated-the-power-of-301-redirects-11653"> permanently redirects to the canonical one</a>. If this isn’t possible – for example, due to tracking parameters like <em>referral_id=1</em> above – use a <a href="http://searchengineland.com/canonical-tag-16537">link rel=canonical tag that points to the canonical URL</a> and<a href="http://searchengineland.com/google-lets-you-tell-them-which-url-parameters-to-ignore-25925"> configure Bing and Google webmaster tools to ignore the appropriate parameters</a>.</p>
<h2>Diagnosing Crawl Inefficiencies</h2>
<p>Allowing pages with no value to be crawled means that the search engines are spending valuable resources crawling things like API calls, log files, or pages with an infinite number of combinations like a web calendar.</p>
<p>Similar to duplicate content, <a href="http://www.webpronews.com/seo-checklist-with-vanessa-fox-2009-06">crawl inefficiency</a> means that search engines are crawling useless pages, at the expense of pages that you would like crawled.</p>
<p>These zero-value pages aren’t going to lead to any conversions, assuming that they are even indexed by search engines or rank well for anything.</p>
<p>To fix these types of problems, use the robots.txt file to exclude these types of pages. Be sure to test any changes to your robots.txt file in Google Webmaster Tools before pushing them live.</p>
<p>Networking problems can be very elusive. Most of the networking problems I have seen involve either load balancing or DNS.</p>
<p>Load balancing is used on larger sites to spread web requests among a number of back end servers. Sometimes it is misconfigured in a way in which most of the crawler requests go to one backend server, which eventually slows to a crawl.</p>
<p>DNS problems can make a website unnecessarily slow for first time visitors or in extreme cases, make it intermittently unavailable.</p>
<p>You can easily check your DNS configuration with an on-line tool like IntoDNS. Checking the load balancers or other aspects of the back end network is not so easy, so it’s probably best to ask a network engineer about any recent changes to the infrastructure.</p>
<h2><strong>Symptoms: Wrong Pages Ranking, Decline In Ranking</strong></h2>
<p>These symptoms are usually caused by duplicate copies of important pages or by search engines not being able to understand the linking structure of your site.</p>
<p style="text-align: center;"><img class="size-full wp-image-77666 aligncenter" src="http://searchengineland.com/figz/wp-content/seloads/2011/05/bing-does-not-like-your-site-either.png" alt="" width="466" height="59" /></p>
<p>&nbsp;</p>
<p>Duplicate content can have a negative effect on ranking because inbound links to a particular page – a very important signal for search engines – are spread out among different URLs. As a result, the search engine is only aware of the number of inbound links for the one copy of the page that it decides to keep.</p>
<p>Make sure that all of the intended inbound links count towards the page by fixing these duplicate URLs as described above.</p>
<p>Another important signal for search engines is how a page is linked within a site. For example, a page with a link from the homepage will be considered a more important page than a page that is orphaned on the site with no links.</p>
<p>Coding site navigation elements in Flash, Silverlight, or JavaScript can make it impossible for search engines to extract these links. As a result, they are missing key information about what pages on a site are the most important.</p>
<h2>Investigate Before You Make Assumptions</h2>
<p>This is not a complete list of root causes for indexing issues and traffic loss, but it does contain the most common issues that I have seen with sites that I have been asked to review.</p>
<p>Other causes of similar symptoms are page speed, cache unfriendliness, internationalization issues, server misconfigurations, and security vulnerabilities. Each one is worthy of an article in itself.</p>
<p>I hope this provides some additional ideas of where to hunt down causes of particularly vexing problems with the way your site is performing in search.</p>
<p>Fortunately, it is much easier to redirect a duplicate copy of your site or fix a DNS misconfiguration than it is to influence Google or Bing’s algorithms.</p>
<p>While search engines definitely penalize some sites and it is possible for a site to get caught up in algorithm changes, make sure you have thoroughly reviewed your technical architecture before jumping to any conclusions about what search engines don’t &#8220;like&#8221; about it.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/search-engines-don%e2%80%99t-like-you-don%e2%80%99t-jump-to-conclusions-76868/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
