<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>searchengineland.com &#187; Stephan Spencer</title>
	<atom:link href="http://searchengineland.com/author/stephan-spencer/feed" rel="self" type="application/rss+xml" />
	<link>http://searchengineland.com</link>
	<description>Search Engine Land: Must Read News About Search Marketing &#38; Search Engines</description>
	<lastBuildDate>Mon, 09 Nov 2009 22:18:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Science Of Scoring Your SEO</title>
		<link>http://searchengineland.com/the-science-of-scoring-your-seo-28769</link>
		<comments>http://searchengineland.com/the-science-of-scoring-your-seo-28769#comments</comments>
		<pubDate>Thu, 29 Oct 2009 17:10:42 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=28769</guid>
		<description><![CDATA[SEO is an art. (Hence, the name of my and my co-authors&#8217; brand new book, The Art of SEO). Crafting copy that sells, as well as ranks, is an art.  So is link baiting. But SEO is also a science. Crafting rewrite rules, robots.txt directives, and so forth is pretty geeky stuff. The science [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fthe-science-of-scoring-your-seo-28769"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fthe-science-of-scoring-your-seo-28769" height="61" width="51" /></a></div><p>SEO is an art. (Hence, the name of my and my co-authors&#8217; brand new book, <em><a href="http://www.amazon.com/Art-SEO-Theory-Practice/dp/0596518862">The Art of SEO</a></em>). Crafting copy that sells, as well as ranks, is an art.  So is link baiting. But SEO is also a science. Crafting rewrite rules, robots.txt directives, and so forth is pretty geeky stuff. The science side of SEO is where I spend most of my time.</p>
<p>Another dichotomy is that SEO is both subjective and objective. The point at which a title tag, URL, or headline is &#8220;good enough&#8221; and thus moving on to the next task is warranted &#8212; that is certainly subjective. Also consider what might comprise the <em>most</em> optimal URL structure? Does it end in / (slash) or a file extension like .html? Again, subjective.</p>
<p>In my view, SEO for the most part is cut-and-dry, it&#8217;s objective. That&#8217;s because it can all be boiled down to an algorithm, and in fact, it already has. The algorithm I speak of, of course, is Google&#8217;s (or Yahoo&#8217;s, or Bing&#8217;s). The SEO practitioner&#8217;s challenge is to reverse-engineer that algorithm to the best of their ability. But it shouldn&#8217;t stop there. Why not write your own algorithm &#8212; an approximation of the search engine&#8217;s own algorithm, one that teases out the various signals and accurately assesses the quality, relevance and importance of these signals without human intervention/assistance?</p>
<p>Running algorithmic analysis on a site-by-site and a page-by-page basis will then allow you to ascertain a site&#8217;s SEO health, and more importantly, the subsequent actions required in this never-ending process known as optimization. That is data-driven decision-making, my friends, and it will be a key driver in the next stage in the evolution of SEO.</p>
<p>To be effective, SEO scoring has to get granular. Knowing you scored an 89 out of 100, or a B+, overall with your SEO may be reassuring, but there weren&#8217;t any next steps that followed from that knowledge. The same is true even if you individually score each of the major SEO areas of focus. In my <a href="http://www.practicalecommerce.com/member/3-Stephan-Spencer/articles">SEO Report Card</a> column for Practical Ecommerce, I (arbitrarily) chose the following areas of focus: Home Page Content, Inbound Links and PageRank, Indexation, Internal, Hierarchical Linking Structure, HTML Templates and CSS, Secondary Page Content, Keyword Choices, Title Tags, and URLs. I don&#8217;t claim that these are the best &#8220;buckets&#8221;. Nonetheless, scoring such broad areas is still not actionable, really.</p>
<p>Score the title tags, internal anchor text, keyword prominence, H1s, meta descriptions and so forth separately, and on a page-by-page basis, and now you&#8217;re talking!</p>
<p>SEO effectiveness can be deconstructed into its many components. It can be benchmarked against competitors. Inferences can be made, priorities can be set, content can be massaged, link juice can be directed. Consequently, the SEO practitioner relies less on their gut and more on the data to drive their actions.</p>
<p>One enterprise-level SEO scoring technology that supports such a data-driven approach to SEO is Covario&#8217;s <a href="http://www.covario.com/products_organic_search_insight.shtml">Organic Search Insight</a> &#8212; which gets so granular that components of navigation, URLs and so forth are assessed (as can be seen in the screenshot below). Craig MacDonald, VP of Marketing and Product Management told me &#8220;The impact analysis can also be statistically modeled, based on gathering data across many sites over time in order to ascertain the relationship between changes in factors and the impact of those relative factors on the various search engines – i.e., the science can be rigorously applied.&#8221; From that, specific recommendations are automatically made and prioritized.</p>
<p><a href="http://www.flickr.com/photos/23148333@N06/4055074288/"><img src="http://farm4.static.flickr.com/3073/4055074288_5ae384b6f4.jpg" alt="Covario Organic Search Insight screenshot" width="500" height="384" /></a></p>
<p class="wp-caption-text">(click to view full size)</p>
<p>SEO is a moving target, one that is heavily dependent on algorithm shifts, site changes/updates, the competitive landscape in which one operates, etc. As such, you must continuously monitor and evaluate, ideally with an automated tool. In fact, such a tool is mandatory if you have a large site and you want your SEO activities to be scalable. With this monitoring in place, a page element (like a meta description) that goes AWOL can be flagged and the issue addressed (e.g. internal resources deployed) much faster than would be otherwise possible. Even better if the webmaster can be flashed warnings prior to making site modifications that will be detrimental to SEO.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/the-science-of-scoring-your-seo-28769/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>10 Last-Minute SEO Tips For Holiday Shopping Season</title>
		<link>http://searchengineland.com/10-last-minute-seo-tips-for-holiday-shopping-season-26861</link>
		<comments>http://searchengineland.com/10-last-minute-seo-tips-for-holiday-shopping-season-26861#comments</comments>
		<pubDate>Thu, 01 Oct 2009 15:57:54 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=26861</guid>
		<description><![CDATA[The holiday shopping season is coming up fast. Don&#8217;t worry, there&#8217;s still time to do some quick-turnaround SEO that can have an impact on your natural search traffic (and resulting revenue!) in time for Black Friday and CyberMonday.
For many online retailers, November and December are the busiest months of the year. Of course, this is [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2F10-last-minute-seo-tips-for-holiday-shopping-season-26861"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2F10-last-minute-seo-tips-for-holiday-shopping-season-26861" height="61" width="51" /></a></div><p>The holiday shopping season is coming up fast. Don&#8217;t worry, there&#8217;s still time to do some quick-turnaround SEO that can have an impact on your natural search traffic (and resulting revenue!) in time for <a href="http://www.blackfriday.com/black-friday-2009">Black Friday</a> and <a href="http://www.cybermonday.com">CyberMonday</a>.</p>
<p>For many online retailers, November and December are the busiest months of the year. Of course, this is one of the most nerve-racking times of the year as well, especially when a disproportionate amount of one&#8217;s business hinges on such a short time span.</p>
<p>As you gear up for the holiday season, search marketing underpins your overall online marketing. Don&#8217;t neglect &#8220;search&#8221;, invest in it. Building on <a href="http://searchengineland.com/five-search-marketing-tips-for-the-holidays-26463">these holiday SEM tips</a> from Niraj Shah, here are a few more SEO-specific tips to help you on your merry way&#8230;</p>
<p><strong>Escape the code freeze</strong></p>
<p>A common IT practice among mid to large size online merchants is to institute a &#8220;code freeze,&#8221; or site lockdown, during the holiday shopping season to minimize the potential for catastrophic errors and downtime. If your organization institutes a code freeze, you may not be able to make changes to your site for months. This means your on-page SEO effectively stagnates for an entire quarter. It also means you must race against the clock to implement SEO initiatives of any significance; and if you don&#8217;t make it in time, you must wait until the new year.</p>
<p>One way around this stress-inducing time crunch is to implement an <a href="http://www.gravitystream.com">SEO proxy platform</a>. Such a system allows you to implement optimizations via the proxy throughout the holidays, quickly and easily, without impacting your native site.</p>
<p>Perhaps your code freeze still allows you to add static landing pages during the holiday season. That&#8217;s better than nothing, but it can take several weeks for new pages to make it into the search engines &#8212; and during the holidays, time is of the essence. So, the sooner you can add links to these new pages, the better.</p>
<p><strong>Audit time</strong></p>
<p>If you haven&#8217;t already, now is the time to do a quick audit of your site. If a code freeze is or will soon be in place, you won&#8217;t be able to make sweeping changes, but hopefully there&#8217;s still the opportunity to fix anything that isn&#8217;t working right. Especially if you just underwent any major changes in the last few months, this is a critical time to find anything that slipped through the cracks.</p>
<p>If you aren&#8217;t under a regimented code freeze, it may not be a bad idea to implement a self-imposed one as this can be a dangerous time to make major changes to your site.</p>
<p>Double check your contact form, live chat or other mechanisms that are in place for customer contact.</p>
<p>Review the last couple months of log files or your site analytics, looking for any 404 errors for missing files, moved or removed pages, broken links on your site, or missing graphics.</p>
<p>Make note of your most active pages. While you probably won&#8217;t be making any major changes to these pages, they might be good targets for including links to other key site pages to route holiday traffic.</p>
<p><strong>Reorganize your internal links</strong></p>
<p>Your customers probably have different buying habits during the holiday season than they do at any other time of the year. Therefore, it&#8217;s common sense that you should modify your internal linking structure to reflect seasonality. Don’t trash your existing site’s navigation, simply augment it with additional links containing keyword-rich anchor text, to create shortcuts that pass PageRank to your popular holiday categories and products. For example, if all of your holiday ornaments are three clicks away from your home page, create a text link on your home page that reads &#8220;Christmas ornaments&#8221; or &#8220;holiday ornaments.&#8221; Don&#8217;t rely on links on your site map page or on footer links to achieve this; such links are less than ideal.</p>
<p>Since hundreds of thousands of people search for phrases that include &#8220;gifts,&#8221; you would do well to create a Gift Ideas page for your specific industry/market, then optimize it and place it one click away from your homepage to maximize its PageRank and give it the best opportunity to rank well.</p>
<p>Don’t go overboard in your internal linking. Keep in mind that Google advises that you to keep the number of links on a page to fewer than 100.</p>
<p><strong>Merry meta descriptions</strong></p>
<p>Are you promoting a holiday sale or specific items on your site? If so, don&#8217;t forget to polish your meta descriptions so that searchers will recognize the keywords they&#8217;re looking for. Last-minute holiday shoppers will be attracted to descriptions that speak to them, so remember to mention seasonal search phrases to encourage them to click through to your site.</p>
<p>Including calls-to-action and/or value propositions into these meta descriptions will help ensure these searchers react favorably and click on your listings.</p>
<p>Be sure to make a list of the pages you revised so you can change the meta descriptions back after your New Year&#8217;s Day sale.</p>
<p><strong>Pareto principle of link building</strong></p>
<p>The <a href="http://en.wikipedia.org/wiki/Pareto_principle">Pareto Principle</a>, also known as the 80/20 rule, says that 80 percent of the value/effects come from 20 percent of the causes. Arguably that concept could be applied to link building: 80% of your link authority (PageRank) comes from 20% of your back links. So your job is to focus on building more of those &#8220;vital few&#8221; links that deliver the bulk of your link authority.</p>
<p>Now is NOT the time to start some long, drawn-out link building initiatives to build these high-value links. There just aren&#8217;t enough weeks left in the 4th Quarter to do proper planning and execution for a complex and involved campaign, such as a music video creation contest. Focus on the &#8220;quick wins&#8221; &#8211; things like <a href="http://searchengineland.com/the-social-media-underground-22030">socially-seeded link bait articles</a> hosted on your site, or single links that by themselves will have a measurable impact, acquired from sites where you have a relationship or some other &#8220;in&#8221;.</p>
<p>Have you been meaning to submit a guest article to a respected online publication that would love to publish your &#8220;thought leadership&#8221; and agrees to link to you from the byline/bio? No time like the present for that! One link from a trusted high PageRank authoritative source like that could boost your rankings in just weeks.</p>
<p>Also, use your influence with business partners and bloggers you know who already link to you, and try to get them to revise the anchor text of their links to you when the anchor text is less than ideal (e.g. &#8220;click here&#8221; or &#8220;visit site&#8221;). Again, focus on your most valuable links.</p>
<p><strong>&#8220;Free&#8221; is a strong attractor</strong></p>
<p>With giveaways like &#8220;free gift wrap&#8221; and &#8220;free shipping&#8221; you&#8217;ll attract holiday shoppers because you&#8217;re providing them with real value. Take advantage of this fact by incorporating powerful messaging (e.g. &#8220;free gift wrap,&#8221; free shipping&#8221;) into the title tags, body copy, and meta descriptions which will filter into the snippets of your search listings.</p>
<p>Even though it may not be free, another way to &#8220;give&#8221; during the holiday season is to offer gift certificates for the last-minute shopper. Feature gift certificates prominently on your site and cross-sell them on your “gifts” and “gift ideas” pages to achieve maximum visibility. Start optimizing for gift certificate related search terms through featuring your gift certificates immediately; don&#8217;t wait until the holiday season kicks into full swing.</p>
<p><strong>Blog to attract customers and links</strong></p>
<p>Hopefully you already have a blog. If not, then you&#8217;ve already found your New Year&#8217;s resolution. (What a relief to have that out of the way, eh!)</p>
<p>It should go without saying: make sure that your blog isn&#8217;t just another sales pitch. Your blog should be about connecting and communicating with your readers. Share some holiday stories, maybe your favorite recipes, or offer helpful packing and shipping tips. Any product mentions should be done carefully and subtly and in moderation. Blog with conviction and/or humor and/or personality. Offer real value. Be transparent, authentic.</p>
<p>If you don’t have time to blog yourself or with internal resource, you could try recruiting passionate customers as blog authors and run a group blog.</p>
<p>Map out your blogging for the season just like you map out your sales and advertising calendar. Plan posts now and start working on them for publishing later so that you can keep on top of them during the busy holiday season. Submit pre-written, post-dated blog content into your blog platform (e.g. WordPress) so you can maintain an active publishing schedule &#8211; even if you&#8217;re pressed for time and blogging seems daunting (if not impossible). That way, when you are inspired and free to generate a flurry of blog posts, all of these posts won&#8217;t be clumped together.</p>
<p>Always keep writing. Make sure posts are published regularly and frequently so that you have very few &#8220;gaps&#8221;.</p>
<p><strong>Participate in the blogosphere</strong></p>
<p>If all you do to engage with bloggers is to blog yourself, you&#8217;re really missing the mark. You should be spending as much time commenting on the blogs of important bloggers in your industry/segment/market as you spend writing for your own blog. That will help get you on the radar screen of these influencers. You can also generate positive buzz in the blogosphere by sending free product samples or review copies to these bloggers with “no strings attached” — it&#8217;s a bad idea to try to buy bloggers off by sending them useless kitsch. Remember that bloggers can wreak havoc on reputations, so tread carefully.</p>
<p><strong>Get social</strong></p>
<p>It is not too late to get out there, create viral content, and build your network of friends. You&#8217;ll need to work fast though. Whether it&#8217;s on YouTube, Facebook, Delicious, Flickr, Digg, etc. It can be as easy as publishing a killer list of gift suggestions and asking a power user friend to &#8220;seed&#8221; it into social sites like StumbleUpon and <a href="http://kirtsy.com">Kirtsy</a>.</p>
<p>And remember, the #2 search engine isn&#8217;t Yahoo, it&#8217;s YouTube. If you&#8217;ve been waiting for your film debut, here&#8217;s your chance: produce a light-hearted, or humorous, or helpful video and post it to YouTube. It could be just the thing for that extra boost. If you have products that require complicated assembly, some short instructional videos would almost certainly be well-received; who knows, maybe they could become unexpected holiday hits.</p>
<p><strong>Connecting offline and online</strong></p>
<p>If you live and breathe SEO, offline may be the furthest thing from your mind. Of course, there are other angles to the offline world. Hopefully you have the basic fundamentals covered, like making sure that your web address appears in all your mailings, advertisements, and anywhere else you may be marketing offline. If you have brick-and-mortar locations, it doesn&#8217;t hurt to remind people that you also have a website.</p>
<p>Don&#8217;t forget that many of the traditional offline entities now have online presences themselves. TV news, radio stations, and newspaper reporters are often looking for interesting holiday stories, from the &#8220;most unusual&#8221; gift ideas to how to entertain for the holidays. Come up with an interesting story idea and you may not only get valuable airtime or print mention, but a link from their site to yours.</p>
<p>Separate out, into &#8220;buckets,&#8221; those purchases that happened offline (e.g. phone orders) but that resulted from online marketing (i.e. were generated from natural search, from paid search, from print, etc.). You could even go more granular, beyond the referral source, and associate actual keywords (search terms) with these referral sources.</p>
<p>With some cleverness, creativity and a bit of &#8220;elbow grease&#8221;, this holiday season could be your most successful yet, recession or no recession. And there&#8217;s still time, if you act now.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/10-last-minute-seo-tips-for-holiday-shopping-season-26861/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How To Choose Content Management Systems For SEO</title>
		<link>http://searchengineland.com/how-to-choose-content-management-systems-for-seo-24945</link>
		<comments>http://searchengineland.com/how-to-choose-content-management-systems-for-seo-24945#comments</comments>
		<pubDate>Thu, 03 Sep 2009 10:55:17 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=24945</guid>
		<description><![CDATA[Nowadays, a great many websites  are powered by a content management system (CMS) along with a back-end database. And for good reason. It&#8217;s too unwieldy to code HTML on a page-by-page basis, as you expand your content offerings to the thousands or tens of thousands of pages (and beyond). Content managements systems to the rescue! [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fhow-to-choose-content-management-systems-for-seo-24945"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fhow-to-choose-content-management-systems-for-seo-24945" height="61" width="51" /></a></div><p>Nowadays, a great many websites  are powered by a content management system (CMS) along with a back-end database. And for good reason. It&#8217;s too unwieldy to code HTML on a page-by-page basis, as you expand your content offerings to the thousands or tens of thousands of pages (and beyond). Content managements systems to the rescue! But there can be downsides too.</p>
<p>My biggest gripe with the content management systems of today is their lack of SEO features. And I&#8217;m not talking about meta keywords, which are a complete waste of time.</p>
<p>I&#8217;m patiently waiting for the day when a CMS-based site can rival static HTML sites in <a href="http://searchengineland.com/library/seo">SEO</a>. No bones about it, hand-coded sites offer complete, granular control over each page, and every single tag contained within. That&#8217;s real flexibility. Too bad they don&#8217;t scale. Therefore, the SEO practitioner is going to need a CMS that will at least be cooperative.</p>
<p>Which SEO features should you be shopping for in a CMS? Glad you asked. Here&#8217;s my wish list of features, broken down into critical, important, desirable and optional&#8230;</p>
<p><strong>Critical CMS features</strong></p>
<ul>
<li>URLs free of tracking parameters and session IDs  &#8212; Sticking session or tracking information such as the user&#8217;s clickpath into the URL is deadly for SEO. It usually leads to incomplete indexation and duplicate content issues.</li>
<li>H1 tags  &#8212; No H1 tags on a given page is not desirable. Too many H1 tags on the page is not desirable. Low-value content (such as the publication date) marked up as an H1 is not desirable. The article title is typically the best content to have wrapped in an H1.</li>
<li>Customizable URL structure  &#8212; If the default URL structure of the CMS doesn&#8217;t suit your needs, you should be able to change it. For example, if you don&#8217;t want /archives/ in the URLs of all your archived articles, you should be able to remove it. Or if you want to reference the article name instead of the article&#8217;s database ID in the URL, you should be able to do it.</li>
<li>301 redirects to canonical URL  &#8212; Duplicate content is the bane of the existence of many a dynamic website owner. Automatic handling of this by the CMS through the use of 301 redirects is a must.</li>
</ul>
<p><strong>Important CMS features</strong></p>
<ul>
<li>Static-looking URLs  &#8212; The most palatable URLs to spiders are the ones that look like they lead to static pages, i.e. no query strings.</li>
<li>Keywords in URLs  &#8212; Keywords in your URLs can help your rankings. It would be a shame to miss out on the opportunity this presents, if your CMS doesn&#8217;t support keyword-rich URLs (e.g. only article IDs in the URL).</li>
<li>RSS feeds  &#8212; RSS feeds are essential if you want to reach bloggers; email newsletters won&#8217;t cut it for the hip, Web 2.0 crowd. Hopefully this feature also comes integrated with Feedburner, for improved visibility on RSS feed consumption by your subscribers.</li>
<li>Pings  &#8212; This lets blog and feed search engines like Google Blog Search know you have published new content so they can come and grab your latest RSS feed.</li>
<li>Tagging and tag clouds  &#8212; This Web 2.0 feature is powerful for SEO, thanks in large part to the keyword-rich text links. This is your opportunity to rejig your internal linking structure and how you flow PageRank without having to completely gut your taxonomy/ontology.</li>
<li>Individually customizable title tags and H1 tags &#8212; Each title tag should be decoupled from the post/article/product title. Same goes for H1 tags. That way anchor text can be varied from H1&#8217;s which can, in turn, be varied from the title tag. Thus, you can work in additional keywords (synonyms etc.) into the H1, and even more into the title tag &#8212; without spamming of course!</li>
<li>Multi-level categorization structure  &#8212; It&#8217;s awfully limiting to your site structure and internal hierarchical linking structure to have a CMS that doesn&#8217;t allow you to nest subcategories into categories, sub-subcategories into subcategories, and so on.</li>
<li>Canonical tags &#8212; Although I don&#8217;t trust Google to always reliably obey this new tag, it is definitely worthwhile having it available as an option if the need arises (hopefully that need won&#8217;t arise if you have 301&#8217;s in all the right places).</li>
</ul>
<p><strong>Desirable CMS features</strong></p>
<ul>
<li>Paraphrasable excerpts  &#8212; Duplicate content issues are exacerbated on dynamic sites such as blogs when the same content is displayed on permalink pages, category pages, archives-by-date pages, tag pages, and the home page. Crafting unique content for the excerpt and having that content display on all locations except for the the permalink page will help strengthen your permalink page as unique content.</li>
<li>Breadcrumb navigation  &#8212;  It reinforces the hierarchical nature of your internal linking structure using text links which are hopefully keyword-rich.</li>
<li>Flexible rules for automatically generating title tags  &#8212; If the title tag always has to start with your site name followed by a colon followed by your article title, you&#8217;re sunk &#8212; at least as far as your SEO is concerned. You should be able to revise the &#8220;recipes&#8221; used to generate the title tags across your site to make them more optimal for search.</li>
<li>Page-specific meta descriptions  &#8212; A cardinal sin of dynamic websites is using the same meta description across all the pages. This can be a contributor to duplicate content issues.</li>
<li>Meta noindex for low-value pages  &#8212; Even if you nofollow links to these pages, other people may still link to these and you run the risk of ranking those pages above some of your more valuable content.</li>
<li>Keyword-rich intro copy on category-level pages and tag pages  &#8212; Keyword-rich introductory copy helps set a stable keyword theme/focus for the page, rather than relying on the latest article, product, or blog post to be the most prominent text on the page.</li>
<li>Granular control over nofollows on links &#8212; If your site allows the posting of user-generated content through &#8220;comments,&#8221; your site will be a spam-magnet if you don&#8217;t nofollow the links posted by commenters. Heck, you&#8217;ll probably be a spam magnet anyways, it&#8217;ll just be worse for you without the nofollows. Additionally, regardless of your stance on PageRank sculpting and its value for SEO, you should be able to selectively decide when and when not to pass PageRank to an internal page within your site.</li>
<li>Customizable anchor text on navigational links  &#8212; &#8220;Contact&#8221;, &#8220;About Us&#8221;, &#8220;Read More&#8221;, &#8220;Full Article&#8221; etc. all make for lousy anchor text &#8212; at least from an SEO standpoint. Hopefully your CMS allows you to improve such links to make the anchor text more keyword-rich.</li>
<li>Mass Edit, or Bulk Upload (or both) &#8212; It&#8217;s not efficient to go to each page&#8217;s Edit screen. Instead, mass modify the titles, H1&#8217;s, filenames, and perhaps even meta descriptions, within Excel or a &#8220;mass edit&#8221; web interface (like the one provided by my <a href="http://www.netconcepts.com/seo-title-tag-plugin/">SEO Title Tag</a> plugin for WordPress.</li>
<li>Declared search term &#8212; When you decide on a page&#8217;s primary keyword focus, you should be able to tuck away that crucial bit of information somewhere where it will be safe from the prying eyes of competitors. That means it should not be parked anywhere in the HTML &#8212; including the meta keywords tag &#8212; since all a resourceful competitor would need to do is &#8220;View Page Source&#8221; within their web browser.  There should be a field in the database, displayed and accessible to your editors/administrators within the admin interface of your CMS.</li>
<li>Auto 301 redirect previous versions of URLs &#8212; Imagine updating a permalink or product page URL (e.g. &#8220;post slug&#8221;) multiple times. Each previous version of a URL could lead the search engines to discover duplicate pages if you&#8217;re not careful.  Why worry about these old URLs and whether they will stop working or will create duplicate content; let the CMS &#8220;worry&#8221; about this instead and seamlessly 301 previous iterations to the latest version.</li>
<li>Google Product Search feed &#8212; If your CMS is powering an online catalog site, then this feature is for you. It can be a real timesaver. And if you are an online retailer not submitting your products into Google Base, heed this warning: neglect Google Product Search (formerly Froogle) at your peril!</li>
</ul>
<p><strong>Optional CMS features</strong></p>
<ul>
<li>XML Sitemaps generator  &#8212; A XML sitemap can be submitted to the major engines to improve indexation, but it&#8217;s usually unnecessary if you have a search engine friendly CMS; the engines will usually do a good job crawling and discovering your site&#8217;s URLs on their own. Google will use your Sitemaps file as a canonicalization signal, but hopefully you don&#8217;t need it since your CMS isn&#8217;t generating duplicate pages.</li>
<li>XHTML validation  &#8212; When entering your content, it is desirable to have the CMS automatically check for malformed HTML, as search engines may end up &#8220;seeing&#8221; a page differently from how it renders on the screen and consider navigation to be part of the content or vice versa.</li>
<li>Pingbacks, Trackbacks, Comments and Anti-spam mechanisms  &#8212; The problem with comments/trackbacks/pingbacks is that they are vectors for spam, so if you have one (comments/trackbacks/pingbacks), you will have the other (spam). Therefore, effective spam prevention (e.g. Akismet, Defensio, Mollom) is a must.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/how-to-choose-content-management-systems-for-seo-24945/feed</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Link Economics 101: A Prerequisite For Advanced SEO</title>
		<link>http://searchengineland.com/link-economics-101-a-prerequisite-for-advanced-seo-23588</link>
		<comments>http://searchengineland.com/link-economics-101-a-prerequisite-for-advanced-seo-23588#comments</comments>
		<pubDate>Thu, 06 Aug 2009 10:55:52 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=23588</guid>
		<description><![CDATA[Links long ago became the currency of the Web, thanks in no small part to Google and its PageRank algorithm. Like anything of value, link authority is bought, sold, leased, bartered, brokered, swindled and stolen. The fact that links are valuable is widely accepted. But yet when pressed, can any SEO or link builder really [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Flink-economics-101-a-prerequisite-for-advanced-seo-23588"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Flink-economics-101-a-prerequisite-for-advanced-seo-23588" height="61" width="51" /></a></div><p>Links long ago became the currency of the Web, thanks in no small part to Google and its PageRank algorithm. Like anything of value, link authority is bought, sold, leased, bartered, brokered, swindled and stolen. The fact that links are valuable is widely accepted. But yet when pressed, can any SEO or link builder really say what that value equates to in real dollars? Of course, attributing sales revenue to direct click-through traffic (from the linking page) is a straightforward exercise (albeit sometimes an incomplete one, if/when the referring URL data is scrubbed). But what about the SEO impact of that particular link on the linked-to page? Or even harder to measure: the &#8220;rising tide that lifts all boats&#8221; effect that the link has across the site by boosting nearby pages / sitewide PageRank / domain authority? Is that value quantifiable?</p>
<p>Many times, we&#8217;re asked by prospects to substantiate our claim that link building (when done well) is a high ROI generating activity, and clients also ask us to track and report on the impact of our link building efforts. You can track and tally your link conquests easily enough with advanced link building tools like <a href="http://squid.searchreturn.com">SQUID</a>, <a href="http://raven-seo-tools.com">Raven</a> and <a href="http://www.buzzstream.com">BuzzStream</a>. It&#8217;s quite another matter to assign a specific dollar value to the links &#8212; at least one with any accuracy.</p>
<p>For example, in my last column, I described the <a href="http://searchengineland.com/the-social-media-underground-22030">process for generating link bait and seeding it into social media</a> that we practice at Netconcepts. Let&#8217;s say you are paying a retainer of $20k per month for this service (ideation, research, writing, seeding, competitive intelligence, phone &amp; email reach-outs), and let&#8217;s say that hundreds, or perhaps thousands, of quality links per month result from the link builder&#8217;s activities. What&#8217;s the ROI on that $20k spend? Now, to complicate matters, let&#8217;s say the link builder works to acquire specific high-value links one-at-a-time, in a &#8220;brute force&#8221; fashion using hand-crafted emails and phone calls? Which links account for which lift in rankings? Even ascertaining which links were discounted due to spam signals is difficult to impossible, at least in any scalable way.</p>
<p>Consider this specific link building campaign. Netconcepts dreamed up and orchestrated a &#8220;<a href="http://www.overnightprints.com/businesscards-for-life-contest.shtml">Free Business Cards for Life</a>&#8221; contest for our client <a href="http://www.overnightprints.com">Overnight Prints,</a> that involved the self-made millionaire, Internet celebrity and Technorati Top 100 blogger <a href="http://www.shoemoney.com">Jeremy Schoemaker</a> (aka &#8220;Shoemoney&#8221;). The contest was to design Jeremy&#8217;s business card. The winning entry, chosen by Jeremy out of over 400 submissions, was <a href="http://www.shoemoney.com/2009/07/25/congratulatons-chiwun-smith/">amazing</a>. Our client, Overnight Prints, received some great links, with some great anchor text, from the initiative. Don&#8217;t ask me to quantify the value of these links though, because I can&#8217;t. But I just &#8220;know&#8221; that this campaign will have an impact.</p>
<p>It&#8217;s an inexact science. Organic rankings and search traffic across the client&#8217;s keyword portfolio, pre- and post-campaign (by at least 90 days), is a good starting point, but is by no means sufficient for the ROI-focused marketer. Both the revenue line and the cost line must be pinned down, somehow. Yet the link builder would likely counter with the argument that revenue is too many steps removed from what they actually have control or influence over, that growth in the quantity and quality of the backlinks is more appropriate in gauging success and justifying their salary &#8212; assuming that the fruit borne from their efforts can be isolated from other marketing activities (e.g. email marketing, direct mail), and from naturally occurring link growth.</p>
<p>All this makes selling a client or boss on the value of link building a tricky proposition. And it dissuades that decision-maker from allocating budget or authorizing the project. Nobody wants to simply &#8220;take your word for it&#8221; and whip out their checkbook.</p>
<p>Despite the aforementioned, the linking ecosystem continues to thrive. The suppliers to the ecosystem &#8212; from the expert individual practitioner (such as <a href="http://www.ericward.com">Eric Ward</a>) to a reputable link broker (one of <a href="http://www.conductor.com">Conductor&#8217;s</a> lines of business) &#8212; are inundated with work. Plenty of disreputable firms also feed at the trough &#8212; whether they be link spammers out of India polluting the blogosphere with their useless blog comments and filling webmasters&#8217; inboxes with annoying link requests, or the link spam software providers that aid and abet them.</p>
<p>Given the attribution challenges, it can be tempting to scrimp on the vendor, consultant, or service hired for this all-important job. Remember though that you always get what you pay for. The wrong choice may ultimately do more harm than good. With no independent clearinghouse to vet link building services, one of the most important steps you can take is to do your homework on your potential vendor &#8212; reference checks, backlink analysis on them and their clients, and the like. Evaluating the business practices, business model, effectiveness, and business viability &#8212; this is all a must. Countless link building services have come and gone; you want to be sure you&#8217;re backing the right horse. Eric Ward advises that each link target requires an approach tailored to the unique nature of the target and the requester (features, content, audience, etc.). Eric warns: &#8220;cookie cutter packaged link building services will never put you in the best light.&#8221;</p>
<p>Furthermore, the &#8220;wing it&#8221; approach to link building is no longer viable, if it ever was.  What&#8217;s required is the application of the scientific method in your link campaigns –- isolating control groups (both keywords and pages) that can be juxtaposed with the experimental group in order to gauge effectiveness of various tactics and campaigns. Competitive SEO now requires the examination of large amounts of backlink data from multiple competing sites to tease out what is meaningful in the inbound link graphs.</p>
<p>This takes a lot of brain power (and math), but this is where we&#8217;re heading with the next generation of auditing tools.  SEO technology solutions (e.g. Conductor, <a href="http://www.covario.com">Covario</a>, <a href="http://www.enquisite.com">Enquisite</a>, <a href="http://www.gravitystream.com">GravityStream</a>, and others) promise us unprecedented ability to track, monitor, and assess on-page and off-page elements for you and perhaps even your competitors, all in a single place. Automating the process of analyzing and quantifying the depth and quality of inbound links is essential to achieving any kind of scale.</p>
<p>Per my recent <a href="http://searchengineland.com/can-the-seo-industry-switch-to-a-pay-for-performance-pricing-model-20771">Pay-for-Performance SEO</a> article, I make the case that all SEO activities (including link building) can be valued and ROI calculated in aggregate (such as through the <a href="http://www.enquisite.com/products/campaign/">Enquisite Campaign</a> product). But let&#8217;s go back to the original premise for this article: what about on a link-by-link basis? I want to be able to say with some certainty that a particular link is worth $X and another link is worth $Y. Indeed, within Campaign (and probably other solutions too), links can have different values based on the keywords they are connected to. Each keyword has a different click (Campaign provides a market-based organic CPC for each keyword) and transaction value, and so does the link.</p>
<p>Technologies such as the aforementioned will herald in a new age where acquired links are tied to particular keywords, assigned a cost-per-click and/or conversion value, pegged as incremental over the baseline, attributed the referral traffic from direct clicks on the links as well as from the search traffic derived from the rankings lift, and so on. This I anticipate will do wonders for legitimizing link building to the direct marketers of the world.</p>
<p>A problem arises, however, when link builders are held responsible for missing ROI targets (regardless of how reasonable those ROI targets might be). This would be unfortunate, because link builders are not in control of the outcome &#8212; the link placement, URL target, anchor text, not to mention whether the link request is granted in the first place. In fact, the higher the credibility of the linking site (and thus the greater the trust, authority and importance in the eyes of the search engines), the less influence the link builder tends to have. Eric Ward describes such high caliber site owners as &#8220;passionately picky about what does and does not get on their pages.&#8221; And he would know; he&#8217;s built links for over 1000 sites over 15 years.</p>
<p>So by all means, predict, goal-set, track, analyze and cost-justify your link building initiatives. Eric just asks that you don&#8217;t hold your link builders accountable for editorial decisions made on sites they do not control.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/link-economics-101-a-prerequisite-for-advanced-seo-23588/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Social Media Underground</title>
		<link>http://searchengineland.com/the-social-media-underground-22030</link>
		<comments>http://searchengineland.com/the-social-media-underground-22030#comments</comments>
		<pubDate>Thu, 09 Jul 2009 11:15:14 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=22030</guid>
		<description><![CDATA[Building links is a struggle we SEOs all face. Of the three pillars of SEO (content, architecture, and links), it&#8217;s the &#8220;link authority&#8221; pillar that i&#8217;s usually the weakest. Looking at sites individually, formulating your approach, sending personalized emails, picking up the phone to speak to the webmasters &#8211; it&#8217;s a lot of hard slog. [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fthe-social-media-underground-22030"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fthe-social-media-underground-22030" height="61" width="51" /></a></div><p>Building links is a struggle we SEOs all face. Of the three pillars of SEO (content, architecture, and links), it&#8217;s the &#8220;link authority&#8221; pillar that i&#8217;s usually the weakest. Looking at sites individually, formulating your approach, sending personalized emails, picking up the phone to speak to the webmasters &#8211; it&#8217;s a lot of hard slog. If only it weren&#8217;t so darned difficult and time-consuming to acquire high quality, relevant links! Yet without such links, you won&#8217;t be able to earn the trust, authority and importance required to rank, and your optimization efforts will fall short.</p>
<p>There is another approach &#8211; a &#8220;secret formula&#8221; if you will, employed by the SEO elite. One that&#8217;s scalable, efficient and high-impact.</p>
<p>It starts with “link bait”. Yes, that overused, industry term that refers to viral content that is irresistible to link to. But link bait by itself isn&#8217;t sufficient. You need to &#8220;seed&#8221; this link bait into social media (such as Digg and StumbleUpon) using power user accounts within those communities. In other words, you need to be (or be able to call in a favor with) a social media insider who has wide-reaching influence within that site&#8217;s social community. Without a bevy of friends, followers and fans, it&#8217;s much harder to reach &#8220;escape velocity&#8221; quick enough. That&#8217;s because the algorithms for &#8220;what&#8217;s new and hot&#8221; within social news/bookmarking sites take into account the time span within which the positive votes are acquired. A thousand Diggs over a year is a very different thing from a thousand Diggs over 24 hours.</p>
<p>The &#8220;secret formula&#8221; really is a formula &#8211; or should I say, formulaic. Think of it as an assembly line process. Viral ideas are generated. The chosen ideas are researched and written up as articles (or produced as videos), then published to your website. The articles are then seeded into appropriate social sites by influentials within those sites&#8217; communities. The point of all this isn&#8217;t to reach your target market directly; these users probably won&#8217;t buy anything from you. It&#8217;s to reach that small percentage of the stampede to your site that are journalists or bloggers &#8211; who will write about and link to your viral content.</p>
<p>Let&#8217;s take a look at this process in greater depth.</p>
<p><strong>Ideation:</strong> It all starts with a great idea and this step is key: your content/angle must be more than just clever to go viral. Gather a team of your most creative and knowledgeable SEOs and marketers (and/or your outside SEO firm) to brainstorm possible ideas for link bait that will likely resonate within social media. Develop a list of ideas; you can start with a handful and save the others for future use.</p>
<p><strong>Selection and prioritization:</strong> Once a solid list has been developed, it&#8217;s time to prioritize and select the top several (e.g. three or four) ideas to develop into full-blown articles/blog posts. Be forewarned: you will have to stretch outside of your comfort zone; the edgier (or geekier, or low-brow) the article, the more it will resonate with the 16-year-old alpha geeks on Digg.</p>
<p><strong>Content creation:</strong> Flesh out the chosen ideas into full-blown articles/blog posts starting with the research. For example, a topic of &#8220;Top 100 Beers from Around the World&#8221; will likely require many pages of information to be collected &#8212; from each beer&#8217;s history, to photos of the bottle/label, to nutrition information.</p>
<p>Make sure to craft a killer headline using this formula from social media marketer <a href="http://muhammadsaleem.com/">Muhammad Saleem</a>: number + adjective + key phrase. e.g. “13 Most Chilling Haunted Hotels” or “16 Incredibly Unconventional Hotel Rooms.” It’s a catchy title that will reel people in. You may also/instead wish to develop video or other visuals to help support your idea.</p>
<p>You really need a &#8220;hook&#8221; to turn an article idea into something that will have &#8220;legs&#8221; in the social sphere. For example, consider a contest idea of &#8220;win free business cards for life.&#8221; Pretty ho-hum, right?</p>
<p>At Netconcepts, we developed a contest for our client Overnightprints.com. What was the angle/spin we added to the free business cards for life idea? Simply this: design Shoemoney&#8217;s (Jeremy Schoemaker&#8217;s) business card, Jeremy will serve as the judge and the winning entry (as determined by Jeremy) will get &#8220;free business cards for life.&#8221;</p>
<p>The cost of the contest for our client was negligible &#8211; the fine print of the contest stated the winnings were a maximum of 1,000 business cards per year for up to 20 years. Jeremy blogged on his Technorati top 100 blog, shoemoney.com, about the contest; he even posted a video about it to YouTube. The link exposure this contest garnered was priceless. Contests prove you don’t have to have a big budget &#8212; just a big idea.</p>
<p><strong>Website Prep:</strong> Just like you&#8217;d prep for surgery, you&#8217;d prep your site for a social push. <a href="http://www.brentcsutoras.com">Brent Csutoras</a> describes the process required here as getting your site &#8220;social media ready.&#8221; For example, if your site is a blog, it would be advisable to distance your site from blogs. Blogs are old and tired and not popular with Digg users anymore. So switch from a bloggy theme to a magazine style theme, and remove the bloggy references, such as date-based archives links and permalinks verbiage.</p>
<p>Also, make sure your site can handle the anticipated traffic. If your server buckles under the load and the site goes dark, your submission doesn&#8217;t just get pulled from Digg&#8217;s home page, it gets removed from the site altogether -i.e. completely obliterated. With Digg, hitting the front page will generate a traffic spike that will quickly dissipate and then disappear almost completely after 24 to 48 hours.</p>
<p><strong>Publishing:</strong> You will need a place to host your linkbait; it really should be on your site if at all possible. Using a third party website to host the article will result in subpar performance; so hopefully you can tolerate the article living on your website. Remember you don&#8217;t have to link to the article from your navigation or sitemap or from anywhere else on your site. Also, you need to be cognizant of the time and day you publish. Publishing on Saturday night would not be a good idea.</p>
<p><strong>Social Media Seeding:</strong> Immediately after you have published the viral content, it then needs to be submitted to the appropriate social media site(s). These may include social news sites such as Digg or social bookmarking sites such as del.icio.us.</p>
<p>As already alluded to, in order to maximize the chance of success, it&#8217;s important that the submitter be a “power user” within these social networks to maximize the chances of hitting the front page or &#8220;popular&#8221; page, thus driving the most visibility, traffic, and -– eventually –- links. A Digg submission from the great unwashed just won&#8217;t get the same traction from the social media community. A power user&#8217;s street cred is worth a lot, so be ready to pay for it &#8212; in terms of cash or favors.</p>
<p>You can&#8217;t be a power user across all the pertinent social media &#8212; it requires simply too much work. Power users spend all day and all night at their computer monitoring various oddball news RSS feeds and other social sites for content that will resonate with their targeted social site&#8217;s users. Once they find something, they quickly submit the URL along with a killer title and description &#8211; before anyone else can. This is how users move up in the pecking order within these social media. The more stories they can get to the front page, the higher their status.</p>
<p>Nearly everything that a top-ranked Digg user touches turns to gold; it&#8217;s not atypical for 80+% of their submissions to hit the Digg.com front page. As you can see, the power user is the key to this equation. If you happen to know one or be one, then you are in luck. If not, you might want to consider hiring a top-notch SEO and social media agency. Also, have a look at the <a href="http://socialblade.com/digg/topusers.html">Top 100 list of Diggers</a> for the most popular power users, note however, real names are rarely attached.</p>
<p>Make friends within these groups and call in favors as you need them. In return, be prepared to do the same. <a href="http://www.gregboser.com">Greg Boser</a> says for every hour you spend working on a client in social, you need to spend six hours doing favors in return for those in your elite network with whom you called in a favor. It’s all about reciprocating. The more you vote and submit for others, the more you can ask for in return.</p>
<p>Now let’s talk results; an ideal result is your link bait hitting the front page of Digg and accumulating a thousand-plus links over the following 3 to 6 months. Why does it take so long when a Digg spike is so immediate? Because bloggers don&#8217;t blog about your link bait right away. They may keep it in the hopper and not get to it for weeks or even months.</p>
<p>Then, because the blogosphere is one big echo chamber, that blogger&#8217;s readership will include other bloggers who will eventually blog about and link to the article too. What would be the yield of really &#8220;hitting it out of the park&#8221; in terms of this sort of link building? Potentially over 5,000 links with perhaps a hundred of those being PageRank 7 and a half dozen being PageRank 8 (hypothetically). That&#8217;s a solid investment.</p>
<p>Link baiting is a critical tactic to have in your SEO “toolkit” because the efforts of just one article hitting the home page of Digg can do wonders for your link authority and thus your Google rankings. Put the time and effort into developing your most linkworthy ideas and go out and negotiate with your power user friends. This combination will ensure your success.</p>
<p>Hope you got some value out of me &#8220;spilling the beans&#8221; on this secret formula for generating linkbait, acquiring links and measuring success. Many marketers think of social media as a plaything; it would rarely be the tool of choice to generate serious ROI. Sure, social media IS fun, but it’s wrong to think that it can&#8217;t pay the bills. Leverage these social media outlets properly, and your link opportunities are endless.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/the-social-media-underground-22030/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Can The SEO Industry Switch To A Pay-for-Performance Pricing Model?</title>
		<link>http://searchengineland.com/can-the-seo-industry-switch-to-a-pay-for-performance-pricing-model-20771</link>
		<comments>http://searchengineland.com/can-the-seo-industry-switch-to-a-pay-for-performance-pricing-model-20771#comments</comments>
		<pubDate>Thu, 11 Jun 2009 11:00:09 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>
		<category><![CDATA[SEM Industry: General]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=20771</guid>
		<description><![CDATA[The currency of the SEO industry is, and has always been, &#8220;dollars for hours&#8221; &#8212; in other words, the traditional consulting model that is in use at accounting and law firms around the globe. Just think of what would be possible if it were to change to something more tied to the value being delivered!
SEOs [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fcan-the-seo-industry-switch-to-a-pay-for-performance-pricing-model-20771"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fcan-the-seo-industry-switch-to-a-pay-for-performance-pricing-model-20771" height="61" width="51" /></a></div><p>The currency of the SEO industry is, and has always been, &#8220;dollars for hours&#8221; &#8212; in other words, the traditional consulting model that is in use at accounting and law firms around the globe. Just think of what would be possible if it were to change to something more tied to the value being delivered!</p>
<p>SEOs aspire to &#8220;knock it out of the park&#8221; for their clients. But they can&#8217;t always do so because fixed-rate contracts constrain them, limiting the resources they can feasibly (profitably) apply to any one campaign or client.  For clients, it&#8217;s common for their SEO budgets to be nowhere near that of  their AdWords budgets. And how could you blame them? Compared to PPC, it&#8217;s so much harder to isolate the impact of previous SEO activities, or to prove in advance the business case for an upcoming SEO initiative.</p>
<p>However, this becomes a self-fulfilling prophesy; it&#8217;s amazing how difficult it is to &#8220;move the needle&#8221; when you have insufficient resources. If only SEO could work like paid search and other direct marketing channels, i.e. like a veritable cash machine where you put $1 in and it spits $10 out&#8230;! Keep putting one dollar bills in, keep getting ten dollar bills out. You&#8217;d want to do that all day; certainly the CFO would. Problem is, it&#8217;s difficult to demonstrate unambiguously and reliably such a cash machine scenario with SEO. Until that changes, SEO will continue to be underfunded.</p>
<p>How do you shift the opinions of those who hold the purse strings &#8212; from SEO being an unpredictable &#8220;black art&#8221; to it being an accountable marketing channel? Simple: a pay-for-performance pricing model similar to that of paid search. This can only be possible when there is clear, comprehensive and objective tracking and measurement of SEO efforts and results. What I&#8217;m talking about is &#8220;search analytics&#8221; (including predictive analytics). This is different from &#8220;web analytics,&#8221; which wasn&#8217;t built by SEOs, gives short shrift to SEO, and really only tells you what you&#8217;ve already been getting rather than what you <em>could</em> be getting. Industrial-strength search analytics isn&#8217;t something we, as an industry, have historically had at our disposal. Until we do, the SEO agencies won&#8217;t be able to truly capitalize on the immense value created for their clients.</p>
<p>At Netconcepts, we have started down this road of accountable marketing.  The way we &#8220;cracked the code&#8221; of pay-for-performance was by developing a scalable, automated natural search technology solution (<a href="http://www.gravitystream.com">GravityStream</a>) and pricing it on a cost-per-click basis. Nonetheless, we still have flat-fee pricing for our SEO audits and monthly retainers for our ongoing SEO consulting, just like everyone else. Whether you&#8217;re client side or agency side, the Holy Grail is a pricing model based on the value delivered. It&#8217;s a win-win for both sides: for the vendor there is upside potential when they over-deliver (although ceilings or caps on that upside potential quickly evaporate much of the incentive), for the client much of the risk has been removed (i.e. only pay for actual clicks or other actions).</p>
<p>Today, this model is fraught with problems. An SEO might be reluctant to enter into such a performance-based arrangement because the actual implementation of his or her recommendations falls to the client (and, as such, is outside their control). The SEO is at the mercy of the client and its IT Department, and the implementation (or lack thereof) directly impacts revenue opportunity. In this scenario, the vendor is betting his/her paycheck on whether IT &#8220;gets it&#8221; and considers these modifications mission-critical. That is a bet with very bad odds.</p>
<p>So as an SEO consultant, how could you possibly adopt a performance-based pricing model without a crystal ball? By employing the newly launched (predictive) search analytics platform <a href="http://www.enquisite.com/products/campaign/">Enquisite Campaign</a>. Let&#8217;s see how&#8230;</p>
<p>With this application, the first step is to create “opportunities” to help you determine whether a given SEO campaign is worth going after. You can estimate the number of potential referrals available, the revenue impact of those referrals, the amount of work required, and the resulting ROI. All these variables may be customized to the unique situation of a given vendor or client.</p>
<p>The potential &#8212; in terms of referrals, dollar value, and ROI &#8212; is calculated by Enquisite Campaign for all the keywords in your defined campaign list. In addition &#8211; here’s the crystal ball part &#8211; the application exposes keywords not on your original list that are either currently driving traffic or not currently driving traffic but related. Importantly, this is done without having any data on conversion rate, bounce rate, time on site, or pages viewed &#8212; data points which you normally need to determine the potential for keywords currently driving traffic. The number of referrals are estimated for keywords that are off your radar, based on keyword popularity data from Google and Yahoo (via their APIs) and click-through rate (CTR). If there is traffic already coming in to your site for a keyword, those referrals are pulled out and shown separately in the reporting so that you can clearly see the upside potential versus baseline traffic.</p>
<p>Here is where the customized variables come into effect. Client revenue from new transactions (i.e. selling stuff) is estimated on that potential traffic by multiplying the incremental search referrals, the conversion rate (as defined by you in a configuration screen) and the average order value (AOV, also defined by you). A value for a non-monetary action, such as a white paper download, can also be defined, along with the revenue potential for you. For example, if you&#8217;re IBM, you might value a white paper download as worth $200 based on how likely a white paper downloader is likely to become a six or seven figure consulting contract, some months down the line. Once the campaign is live, Enquisite Campaign then tracks all the incremental referrals, actions, and transactional conversions based on the real traffic and sales data from your site.</p>
<p>Currently, pay-for-performance is also tricky because when you&#8217;re trying to estimate ROI on a particular keyword, you need have a reasonable sense of what the cost of optimizing will be in man-hours.  How do you do this when you don&#8217;t know the particular page to be optimized and what needs to be done to optimize it? As a vendor, you somehow need to know the &#8220;cost&#8221; (effort required) to acquire the associated revenue. The answer lies in estimating the “optimization difficulty” for each keyword.</p>
<p>Turns out there is a solid correlation between paid search competitiveness and organic optimization difficulty. So, by taking paid search data points, like the number of pages competing for the term, the number of search queries daily on Google and Yahoo, the number of bidders on paid search, the bids, and the CTR, Enquisite Campaign comes up with a reasonable approximation for optimization difficulty. Further, it&#8217;s completely independent of the tactics you&#8217;ll be employing or your particular circumstances &#8212; whether your challenges lie with the CMS, the IT department, or whatever.</p>
<p>Once the SEO nominates a budget in hours for the campaign, Enquisite&#8217;s application then allocates the hours across the identified keywords. If for example, you specify that you want to perform a maximum of 20 hours of SEO work in a month, Enquisite will divvy those 20 hours up across the keywords, assigning more time to the keywords that have the greatest upside for the least amount of effort. Now the ROI for each keyword can be calculated. In Campaign, the ROI is displayed as a percentage and calculated based on: revenue potential divided by the effective hourly rate times the estimated number of hours.</p>
<div class="wp-caption alignnone" style="width: 510px"><a href="http://www.flickr.com/photos/23148333@N06/3614566399/sizes/o/"><img alt="click to view full size" src="http://farm4.static.flickr.com/3619/3614566399_eda7c518a4.jpg?v=0" title="Enquisite Campaign Potential" width="500" height="258" /></a><p class="wp-caption-text">click to view full size</p></div>
<p>Once you know the ROI potential for any campaign, you can decide on what is the most appropriate performance-based pricing model to pitch to the client &#8212; whether it&#8217;s search referrals, sales, actions, or even website referrals.</p>
<p>Armed with the above intelligence, SEO practitioners are able to make a case internally for performance-based pricing. And by removing risk for the prospect, the barriers to saying &#8220;no&#8221; are removed. In a competitive bid situation, a performance-based pitch (presumably informed by data and analysis from Enquisite Campaign or similar) will make you stand out from the crowd because you have some serious &#8220;skin in the game.&#8221; Essentially, vendor and client incentives are aligned. If you drive incremental action, you participate in the value that is created.  The client only pays for incremental upside, all clearly linked to objective data and results. </p>
<p>If you&#8217;ve read the book Freakonomics &#8212; one of my favorite business books of all time &#8212; you&#8217;ll know the extreme importance of such alignment. As with any competitive advantage, eventually others in the industry will take notice and decide to adopt (copy) the model. &#8220;<a href="http://en.wikipedia.org/wiki/Embrace,_extend_and_extinguish">Embrace and extend</a>,&#8221; as Microsoft likes to say!</p>
<p>So there&#8217;s a window of opportunity here not to miss. First movers will be the biggest winners, as they will establish themselves as the true performers who are willing to bet on themselves.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/can-the-seo-industry-switch-to-a-pay-for-performance-pricing-model-20771/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What?! A Search-Hostile Site That Still Ranks Well</title>
		<link>http://searchengineland.com/what-a-search-hostile-site-that-still-ranks-well-18680</link>
		<comments>http://searchengineland.com/what-a-search-hostile-site-that-still-ranks-well-18680#comments</comments>
		<pubDate>Thu, 14 May 2009 12:00:02 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>
		<category><![CDATA[SEO: General]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=18680</guid>
		<description><![CDATA[What follows is a rant, which is something I rarely, if ever, do. It&#8217;s done in the spirit of fun, so don&#8217;t take it too seriously. Enjoy!
I feel like the grandpa who laments in a crotchety voice to his grandkids: &#8220;Nobody ever writes letters anymore! They just sit on their computers and their cell phones [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fwhat-a-search-hostile-site-that-still-ranks-well-18680"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fwhat-a-search-hostile-site-that-still-ranks-well-18680" height="61" width="51" /></a></div><p>What follows is a rant, which is something I rarely, if ever, do. It&#8217;s done in the spirit of fun, so don&#8217;t take it too seriously. Enjoy!</p>
<p>I feel like the grandpa who laments in a crotchety voice to his grandkids: &#8220;Nobody ever writes letters anymore! They just sit on their computers and their cell phones all damn day!&#8221; But instead I&#8217;m saying: &#8220;Nobody ever blogs anymore! They just tweet and re-tweet!&#8221;. For example, <a href="http://twitter.com/dannysullivan/statuses/1652978053">this tweet</a> by @dannysullivan could have been a fantastic blog post. Instead: it&#8217;s 129 characters that merely hints at the story:</p>
<blockquote><p><a href="http://twitter.com/dannysullivan">@dannysullivan</a>: seriously, pinkberry with locations in 2 of 50 states ranks 14 for yogurt? <a href="http://bit.ly/dHOYe">http://bit.ly/dHOYe</a> well <a href="http://twitter.com/mattcutts">@mattcutts</a> does love them :)</p></blockquote>
<p>Last week I had the pleasure of sharing the stage with Danny on a panel at the <a href="http://www.emetrics.org/">eMetrics Summit</a>. The topic, unsurprisingly, was SEO, but targeted to web analytics geeks (a number of whom were SEO newbies). Danny kicked off the session with a quick SEO 101 where he expanded on his gem of a tweet above about <a href="http://www.pinkberry.com">Pinkberry.com</a>. Pinkberry is a frozen yogurt brand that I was unaware of until the session. And what a brilliant example it was. Pinkberry.com is a case study in how NOT to build a website. I think they hired the Anti-SEO to ensure they wouldn&#8217;t rank for anything other than their brand name. </p>
<p>There was really silly stuff going on. Basic, basic on-page SEO was completely mucked up. Like for example, the page titles. Danny showed the audience <a href="http://www.google.com/search?q=site:pinkberry.com&amp;num=100">site: results in Google for Pinkberry.com</a> and the results were, well, disturbing to say the least&mdash;at least for anyone with an SEO bone in his/her body! Sure enough, every title tag was the same across the site. But wait, it gets better! The titles were all one word long: &#8220;Pinkberry&reg;&#8221;. Luckily, the major engines don&#8217;t trip up on circle R and TM symbols, even when they are ASCII characters, or I&#8217;d be complaining about that too! (Nonetheless, I dislike such symbols in title tags. If you must use them in titles or elsewhere or you get yelled at by your legal department, then please &#8220;escape&#8221; them, e.g. &amp;reg;&mdash;it&#8217;s just good HTML etiquette.)</p>
<p>Let&#8217;s move on to what is on the home page, that most important of pages from an SEO perspective. It&#8217;s a circa late 90&#8217;s &#8220;splash page&#8221;. With, you guessed it, zero textual content. <a href="http://google.com/search?q=cache:www.pinkberry.com&amp;strip=1">This</a> is what the home page looks like from a spider&#8217;s perspective. Pretty sad. Well, to be more technically correct, <a href="http://www.seobrowser.com/index.php?user_agent=1&amp;address=http%3A%2F%2Fwww.pinkberry.com&amp;action=Parse+URL">this</a> is what it sees: there&#8217;s a single image with no alt attribute and a filename that is of no help whatsoever.</p>
<p>Moving on past the content-less splash page, you end up on a page where the mouseover navigation relies on JavaScript, which of course the spiders don&#8217;t support. Not only were the mouseover nav items inaccessible, but the main buttons (the ones available without hovering) stopped working. At least the ones that had mouseover effects attached to them. This included their &#8220;Products&#8221;, &#8220;About&#8221;, &#8220;Contact&#8221;, and &#8220;Groupie Corner&#8221;. Oh, and again, no textual content to be found. But hey, at least they had defined some meta keywords, so clearly someone at Pinkberry is at the wheel driving their SEO &#8220;strategery&#8221; (*grin*).</p>
<p>I think the only thing the Anti-SEO didn&#8217;t do was take any textual navigation or content elements that may have been remaining in spider-accessible formats/locations and wrapped a Flash movie around all of them. And perhaps added frames for good measure, complete <a href="http://www.prnewswire.co.uk/">with hidden links in the frameset</a> pointing back to His site.</p>
<p>Yet somehow, despite themselves (as Danny notes in the tweet above) Pinkberry ranks on page 2 in Google for &#8220;yogurt!&#8221; Huh? Or as the younger generation like to say: &#8220;WTF??&#8221;</p>
<p>Matt Cutts, care to comment? Is this a result of your hand editing since you&#8217;re such a <a href="http://twitter.com/mattcutts/statuses/1675980981">raving</a> <a href="http://www.dullest.com/blog/best-yogurt-in-silicon-valley/">fan</a>? Or, put another way (by the more politically incorrect SEOs out there like <a href="http://www.davidnaylor.co.uk/seomoz-handjob-or-not.html">DaveN</a>), a &#8220;hand job&#8221;?</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/what-a-search-hostile-site-that-still-ranks-well-18680/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>A Deeper Look At Robots.txt</title>
		<link>http://searchengineland.com/a-deeper-look-at-robotstxt-17573</link>
		<comments>http://searchengineland.com/a-deeper-look-at-robotstxt-17573#comments</comments>
		<pubDate>Thu, 16 Apr 2009 12:00:26 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>
		<category><![CDATA[How To: SEO]]></category>
		<category><![CDATA[SEO: Blocking Spiders]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=17573</guid>
		<description><![CDATA[The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Fa-deeper-look-at-robotstxt-17573"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Fa-deeper-look-at-robotstxt-17573" height="61" width="51" /></a></div><p>The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed pages still accumulate PageRank? That robots.txt can accept a limited form of pattern matching? That, because of that last feature, you can selectively disallow not just directories but also particular filetypes (well, file extensions to be more exact)? That a robots.txt disallowed page can’t be accessed by the spiders, so they can’t read and obey a meta robots tag contained within the page?</p>
<p>A robots.txt file provides critical information for search engine spiders that crawl the web. Before these bots (does anyone say the full word “robots” anymore?) access pages of a site, they check to see if a robots.txt file exists. Doing so makes crawling the web more efficient, because the robots.txt file keeps the bots from accessing certain pages that should not be indexed by the search engines.</p>
<p>Having a robots.txt file is a best practice. Even just for the simple reason that some metrics programs will interpret the 404 response to the request for a missing robots.txt file as an error, which could result in erroneous performance reporting. But what goes in that robots.txt file? That’s the crux of it.</p>
<p>Both robots.txt and robots meta tags rely on cooperation from the robots, and are by no means guaranteed to work for every bot. If you need stronger protection from unscrupulous robots and other agents, you should use alternative methods such as password protection. Too many times I’ve seen webmasters naively place sensitive URLs such as administrative areas in robots.txt. You better believe robots.txt is one of the hacker’s first ports of call—to see where they should break into.</p>
<p>Robots.txt works well for:</p>
<ul>
<li>Barring crawlers from non-public parts of your website</li>
<li>Barring search engines from trying to index scripts, utilities, or other types of code</li>
<li>Avoiding the indexation of duplicate content on a website, such as “print” versions of html pages</li>
<li>Auto-discovery of XML Sitemaps</li>
</ul>
<p>At the risk of being Captain Obvious, the robots.txt file must reside in the root of the domain and must be named &#8220;robots.txt&#8221; (all lowercase). A robots.txt file located in a subdirectory isn&#8217;t valid, as bots only check for this file in the root of the domain.</p>
<p>Creating a robots.txt file is easy. You can create a robots.txt file in any text editor. It should be an ASCII-encoded text file, not an HTML file.</p>
<p><strong>Robots.txt syntax</strong></p>
<ul>
<li>User-Agent: the robot the following rule applies to (e.g. &#8220;Googlebot,&#8221; etc.)</li>
<li>Disallow: the pages you want to block the bots from accessing (as many disallow lines as needed)</li>
<li>Noindex: the pages you want a search engine to block AND not index (or de-index if previously indexed). Unofficially supported by Google; unsupported by Yahoo and Live Search.</li>
<li>Each User-Agent/Disallow group should be separated by a blank line; however no blank lines should exist within a group (between the User-agent line and the last Disallow).</li>
<li>The hash symbol (#) may be used for comments within a robots.txt file, where everything after # on that line will be ignored. May be used either for whole lines or end of lines.</li>
<li>Directories and filenames are case-sensitive: “private”, “Private”, and “PRIVATE” are all uniquely different to search engines.</li>
</ul>
<p>Let’s look at an example robots.txt file. The example below includes:</p>
<ul>
<li>The robot called “Googlebot” has nothing disallowed and may go anywhere</li>
<li>The entire site is closed off to the robot called “msnbot”;</li>
<li>All robots (other than Googlebot) should not visit the /tmp/ directory or directories or files called /logs, as explained with comments, e.g., tmp.htm, /logs or logs.php.</li>
</ul>
<p><code>User-agent: Googlebot<br />
Disallow:
</code></p>
<p><code>User-agent: msnbot<br />
Disallow: /
</code></p>
<p><code># Block all robots from tmp and logs directories<br />
User-agent: *<br />
Disallow: /tmp/<br />
Disallow: /logs # for directories and files called logs
</code></p>
<p><strong>What should be listed on the User-Agent line?</strong> A user-agent is the name of a specific search engine robot. You can set an entry to apply to a specific bot (by listing the name) or you can set it to apply to all bots (by listing an asterisk, which acts as a wildcard). An entry that applies to all bots looks like this:</p>
<p><code>User-Agent: *</code></p>
<p>Major robots include: Googlebot (Google), Slurp (Yahoo!), msnbot (MSN), and TEOMA (Ask).</p>
<p>Bear in mind that a block of directives specified for the user-agent of Googlebot will be obeyed by Googlebot; but Googlebot will NOT ALSO obey the directives for the user-agent of * (all bots).</p>
<p><strong>What should be listed on the Disallow line?</strong> The disallow lists the pages you want to block. You can list a specific URL or a pattern. The entry should begin with a forward slash (/).</p>
<p>Examples:</p>
<ul>
<li>To block the entire site: <code>Disallow: /</code></li>
<li>To block a directory and everything in it: <code>Disallow: /private_directory/</code></li>
<li>To block a page: <code>Disallow: /private_file.html</code></li>
<li>To block a page and/or a directory named private: <code>Disallow: /private</code></li>
</ul>
<p>If you serve content via both http and https, you’ll need a separate robots.txt file for each of these protocols. For example, to allow robots to index all http pages but no https pages, you’d use the robots.txt files as follows, for your http protocol:</p>
<p><code>User-agent: *<br />
Disallow: </code></p>
<p>And for the https protocol:</p>
<p><code>User-agent: *<br />
Disallow: /</code></p>
<p>Bots check for the robots.txt file each time they come to a website. The rules in the robots.txt file will be in effect immediately once it is uploaded to the site’s root and the bot comes to the site. How often it is accessed varies on how frequently the bots spider the site based on popularity, authority, and how frequently content is updated. Some sites may be crawled several times a day while others may only be crawled a few times a week. Google Webmaster Central provides a way to see when Googlebot last accessed the robots.txt file.</p>
<p>I’d recommend using the robots.txt analysis tool in <a href="http://www.google.com/webmasters/">Google Webmaster Central</a> to check specific URLs to see if your robots.txt file allows or blocks them, see if Googlebot had trouble parsing any lines in your robots.txt file, and test changes to your robots.txt file.</p>
<p><strong>Some advanced techniques</strong></p>
<p>The major search engines have begun working together to advance the functionality of the robots.txt file. As alluded to above, there are some functions that have been adopted by the major search engines, and not necessarily all of the major engines, that provide for finer control over crawling. As these may be limited though, do exercise caution in their use.</p>
<p><strong>Crawl delay:</strong> Some websites may experience high amounts of traffic and would like to slow search engine spiders down to allow for more server resources to meet the demands of regular traffic. Crawl delay is a special directive recognized by Yahoo, Live Search, and Ask that instructs a crawler on the number of seconds to wait between crawling pages:</p>
<p><code>User-agent: msnbot<br />
Crawl-delay: 5</code></p>
<p><strong>Pattern matching:</strong> At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character. To block access to all subdirectories that begin with &#8220;private&#8221;:</p>
<p><code>User-agent: Googlebot<br />
Disallow: /private*/</code></p>
<p>You can match the end of the string using the dollar sign ($). For example, to block URLs that end with .asp:</p>
<p><code>User-agent: Googlebot<br />
Disallow: /*.asp$</code></p>
<p>Unlike the more advanced pattern matching found in regular expressions in Perl and elsewhere, the question mark does not have special powers. So, to block access to all URLs that include a question mark (?), simply use the question mark (no need to &#8220;escape&#8221; it or precede it with a backslash):</p>
<p><code>User-agent: *<br />
Disallow: /*?*</code></p>
<p>To block robots from crawling all files of a specific file type (for example, .gif):</p>
<p><code>User-agent: *<br />
Disallow: /*.gif$</code></p>
<p>Here&#8217;s a more complicated example. Let’s say your site uses the query string part of the URLs (what follows the “?”) solely for session IDs, and you want to exclude all URLs that contain the dynamic parameter to ensure the bots don’t crawl duplicate pages. But you may want to include any URLs that end with a &#8220;?&#8221;. Here’s how you’d accomplish that:</p>
<p><code>User-agent: Slurp<br />
Disallow: /*? 		# block any URL that includes a ?<br />
Allow: /*?$ 		# allow any URL that ends in a ?</code></p>
<p><strong>Allow directive:</strong> At this time, the Allow directive appears to only be supported by Google, Yahoo, and Ask. Just as it sounds, it works the opposite of the Disallow directive and provides the ability to specifically call out directories or pages that may be crawled. This may be beneficial after large sections or the entire site has been disallowed.</p>
<p>To allow Googlebot into only the &#8220;google&#8221; directory:</p>
<p><code>User-agent: Googlebot<br />
Disallow: /<br />
Allow: /google/</code></p>
<p><strong>Noindex directive:</strong> As mentioned above, this directive offers benefits in eliminating snippetless title-less listings from the search results, but it’s limited to Google. Its syntax exactly mirrors Disallow. In the words of <a href="http://www.mattcutts.com/blog/google-noindex-behavior/">Matt Cutts</a>:</p>
<blockquote><p>&#8220;Google allows a NOINDEX directive in robots.txt and it will completely remove all matching site URLs from Google. (That behavior could change based on this policy discussion, of course, which is why we haven’t talked about it much.)&#8221;</p></blockquote>
<p><strong>Sitemap:</strong> An XML sitemap file can tell search engines about all the pages on your site, and optionally, to provide information about those pages, such as which are most important and how often they change. It acts as an auto-discovery mechanism for the spider to find the XML sitemap file. You can tell Google and other search engines about your Sitemap by adding the following line to your robots.txt file:</p>
<p><code>Sitemap: sitemap_location</code></p>
<p>The sitemap_location should be the complete URL to the Sitemap, such as: http://www.example.com/sitemap.xml. This directive is independent of the user-agent line, so it doesn’t matter where you place it in your file. All major search engines support the Auto-Discovery Sitemap protocol, including Google, Yahoo, Live Search, and Ask.</p>
<p>While auto-discovery provides a way to inform search engines about the sitemap.xml file, it’s also worthwhile verifying and submitting sitemaps directly to the search engines through each of their webmaster consoles (Google Webmaster Central, Yahoo Site Explorer, Live Search Webmaster Center).</p>
<p><strong>More about Google’s bots</strong></p>
<p>Google uses several different bots (user-agents). The bot for web search is Googlebot. Google&#8217;s other bots follow rules you set up for Googlebot, but you can set up additional rules for these specific bots as well. Blocking Googlebot blocks all bots that begin with &#8220;Googlebot&#8221;.</p>
<p>Here’s a list of Google robots:</p>
<ul>
<li>Googlebot: crawls pages from web index and news index</li>
<li>Googlebot-Mobile: crawls pages for mobile index</li>
<li>Googlebot-Image: crawls pages for image index</li>
<li>Mediapartners-Google: crawls pages to determine AdSense content, only crawls sites if show AdSense ads</li>
<li>Adsbot-Google: crawls to measure AdWords landing page quality, only crawls sites that use Google AdWords to advertise</li>
</ul>
<p>You can block Googlebot entirely by using:</p>
<p><code>User-agent: Googlebot<br />
Disallow: /</code></p>
<p>You can allow Googlebot, but block access to all other bots:</p>
<p><code>User-agent: *<br />
Disallow: /</code></p>
<p><code>User-agent: Googlebot<br />
Disallow:</code></p>
<p><strong>Issues with robots.txt</strong></p>
<p>Pages you block by using robots.txt disallows may still be in Google&#8217;s index and appear in the search results &#8212; especially if other sites link to them. Granted, a high ranking is pretty unlikely since Google can’t “see” the page content; it has very little to go on other than the anchor text of inbound and internal links, and the URL (and the ODP title and description if in ODP/DMOZ.) As a result, the URL of the page and, potentially, other publicly available information can appear in search results. However, no content from your pages will be crawled, indexed or displayed.</p>
<p>To entirely prevent a page from being added to a search engine’s index even if other sites link to it, use a &#8220;noindex&#8221; robots meta tag and ensure that the page is not disallowed in robots.txt. When spiders crawl the page, it will recognize the &#8220;noindex&#8221; meta tag and drop the URL from the index.</p>
<p><strong>Robots.txt and robots meta tag conflicts</strong></p>
<p>If the robots.txt file and robots meta tag instructions for a page conflict, bots follow the most restrictive. More specifically:</p>
<ul>
<li>If you block a page with robots.txt, bots will never crawl the page and will never read any robots meta tags on the page.</li>
<li>If you allow a page with robots.txt but block it from being indexed using a robots meta tag, Googlebot will access the page, read the meta tag, and subsequently not index it.</li>
</ul>
<p>While robots.txt files are to protect content on a site from being indexed, including a robots.txt file regardless is recommended as many robotic processes look for them and offering one can only expedite their procedures. Together, robots.txt and robots meta tags give you the flexibility to express complex access policies relatively easily:</p>
<ul>
<li>Removing an entire website or part of a website.</li>
<li>Avoiding indexation of images in Google Image Search and other image engines.</li>
<li>Avoiding indexation of duplicate content on a site.</li>
<li>Removing individual pages on a site using a robots Meta tag.</li>
<li>Removing cached copies and snippets using a robots Meta tag.</li>
</ul>
<p>Both robots.txt and robots meta tag rely on cooperation from the robots, and are by no means guaranteed to work for every robot. If you need stronger protection from robots and other agents, you should use alternative methods such as password protection.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/a-deeper-look-at-robotstxt-17573/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>URL Rewrites and Redirects: The Gory Details (Part 2 of 2)</title>
		<link>http://searchengineland.com/url-rewrites-and-redirects-part2-16575</link>
		<comments>http://searchengineland.com/url-rewrites-and-redirects-part2-16575#comments</comments>
		<pubDate>Thu, 12 Mar 2009 15:00:29 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>
		<category><![CDATA[How To: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=16575</guid>
		<description><![CDATA[Welcome back from Part 1, where I discussed in detail how to implement URL rewriting with Apache&#8217;s mod_rewrite module &#8212; complete with example rewrite rules, the more common regular expressions and how to use them. If you recall, I was just starting to get into rewrite rules for 301 redirects using the [R=301] flag. (Incidentally, [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Furl-rewrites-and-redirects-part2-16575"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Furl-rewrites-and-redirects-part2-16575" height="61" width="51" /></a></div><p>Welcome back from <a href="http://searchengineland.com/url-rewrites-and-redirects-part1-16574">Part 1</a>, where I discussed in detail how to implement URL rewriting with Apache&#8217;s mod_rewrite module &#8212; complete with example rewrite rules, the more common regular expressions and how to use them. If you recall, I was just starting to get into rewrite rules for 301 redirects using the [R=301] flag. (Incidentally, I much prefer using the RewriteRule directive for setting up my 301 redirects rather than Redirect, RedirectPermanent, or RedirectMatch; more on that later.)</p>
<p>There&#8217;s another handy directive I often use in conjunction with RewriteRule, called RewriteCond. You would use RewriteCond if you&#8217;re trying to match on something in the query string, the domain name, or other things not present between the domain name and the question mark in the URL (which is what RewriteRule looks at). Note that neither RewriteRule nor RewriteCond can access what is in the anchor part of a URL, i.e. whatever follows a #, because that is used internally by the browser and is not sent to the server as part of the request. The following RewriteCond example looks for a positive match on the host name before it will allow the rewrite rule that follows to be executed:</p>
<p><code>RewriteCond %{HTTP_HOST}  !^www\.example\.com$ [NC]<br />
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]</code></p>
<p>Let&#8217;s deconstruct what&#8217;s happening here. For any host name other than www.example.com, a 301 redirect is issued to the equivalent canonical URL on the www subdomain. The [NC] flag makes the rewrite condition case-insensitive. Where is the [QSA] flag so that the query string is preserved, you might ask? It&#8217;s not needed when redirecting; it&#8217;s implied.</p>
<p>If you don&#8217;t want a query string retained on a rewrite rule with a redirect, put a question mark at the end of the destination URL in the rule. Like so:</p>
<p><code>RewriteCond %{HTTP_HOST}  !^www\.example\.com$ [NC]<br />
RewriteRule ^(.*)$ http://www.example.com/$1? [L,R=301]</code></p>
<p>Note the exclamation point at the beginning of the regular expression. That is interpreted as &#8220;NOT&#8221; by the rewrite engine.</p>
<p>Why didn&#8217;t I use ^example\.com$ instead? Consider:</p>
<p><code>RewriteCond %{HTTP_HOST}  ^example\.com$ [NC]<br />
RewriteRule ^(.*)$ http://www.example.com/$1? [L,R=301]</code></p>
<p>Because that wouldn&#8217;t have matched on typo domains such as exampel.com that the DNS server and virtual host would be set to respond to (assuming that misspelling was a domain you registered and owned).</p>
<p>Under what circumstances might we want to omit the query string from the redirected URL, as was done in the last two examples? When a session ID or a tracking parameter (like &#8220;source=banner_ad1&#8243;) needs to be dropped. Retaining a tracking parameter after the redirect is not only unnecessary (because the original URL with the source code appended would have been recorded in your access log files as it was being accessed), it&#8217;s undesirable from a canonicalization standpoint. What if you wanted to drop the tracking parameter from the redirected URL, but retain the other parameters in the query string? Here&#8217;s how you&#8217;d do it, for static URLs:</p>
<p><code>RewriteCond %{QUERY_STRING} ^source=[a-z0-9]*$<br />
RewriteRule ^(.*)$ /$1? [L,R=301]</code></p>
<p>and for dynamic URLs:</p>
<p><code>RewriteCond %{QUERY_STRING} ^(.+)&amp;source=[a-z0-9]+(&amp;?.*)$<br />
RewriteRule ^(.*)$ /$1?%1%2 [L,R=301]</code></p>
<p>Need to do some fancy stuff with cookies before redirecting the user? Invoke a script that cookies the user then 301s them to the canonical URL:</p>
<p><code>RewriteCond %{QUERY_STRING} ^source=([a-z0-9]*)$<br />
RewriteRule ^(.*)$ /cookiefirst.php?source=%1&amp;dest=$1 [L]</code></p>
<p>Note the lack of a [R=301] flag above. That&#8217;s on purpose. No need to expose this script to the user. Use a rewrite and let the script itself send the 301 after it has done its work.</p>
<p>Other canonicalization issues worth correcting with rewrite rules and the [R=301] flag include when the engines indexes: 1) online catalog pages under HTTPS URLs, and 2) URLs missing a trailing slash that should be there. First the HTTPS fix:</p>
<p><code># redirect online catalog pages in the /catalog/ directory if HTTPS<br />
RewriteCond %{HTTPS} on<br />
RewriteRule ^catalog/(.*) http://www.example.com/catalog/$1 [L,R=301]</code></p>
<p>Note that if your secure server is separate from your main server, you can skip the RewriteCond line above.</p>
<p>Now to append the trailing slash:</p>
<p><code>RewriteRule ^(.*[^/])$ /$1/ [L,R=301]</code></p>
<p>WordPress handles missing trailing slashes by default. Yay WordPress!</p>
<p>Speaking of WordPress, did you know that when you update the &#8220;post slug&#8221; on a published post (i.e. revise the URL), WordPress will automatically 301 redirect all requests for the previous URL to the new URL? In fact, if you modify the post slug multiple times, all previous iterations will be responded to with a 301! And there won&#8217;t be a series of 301s daisy chained together; there is just one redirect issued to the latest iteration. Thus, you can employ a continuous improvement approach to your URL optimization, employing the &#8220;thin slicing&#8221; methodology I described in <a href="http://searchengineland.com/optimizing-large-scale-web-sites-16204">a recent column</a> and my <a href="http://www.netconcepts.com/seo-title-tag-plugin/">SEO Title Tag plugin</a> to mass edit all your permalink post URLs and let WordPress handle the 301s automagically. It&#8217;s a beautiful thing.</p>
<p>After completing a URL rewriting project to migrate from dynamic URLs to static, you&#8217;ll want to phase out the dynamic URLs not just by replacing all occurrences of the legacy URLs on your site, but also by 301 redirecting the legacy dynamic URLs to their static equivalents. That way, any inbound links pointing to the retired URLs will end up leading both spiders and humans to the correct new URL &#8212; thus ensuring the new URLs are the ones that are indexed, blogged about, linked to, and bookmarked. Generally, here&#8217;s how you&#8217;d accomplish that:</p>
<p><code>RewriteCond %{QUERY_STRING} id=([0-9]+)<br />
RewriteRule ^get_product\.php$ /products/%1.html? [L,R=301]</code></p>
<p>However, you&#8217;ll get an infinite loop of recursive redirects if you&#8217;re not careful. One quick-and-dirty way to avoid that situation is by adding a nonsense parameter to the destination URL for the rewrite and ensuring this nonsense parameter isn&#8217;t present before doing the redirect. Specifically:</p>
<p><code>RewriteCond %{QUERY_STRING} id=([0-9]+)<br />
RewriteCond %{QUERY_STRING} !blah=blah<br />
RewriteRule ^get_product\.php$ /products/%1.html? [L,R=301]</code></p>
<p>RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1&amp;blah=blah [L]</p>
<p>Notice above that I used two RedirectCond lines, stacked on top of each other. All redirect conditions listed together in the same block will be &#8220;ANDed&#8221; together. If you wanted the conditions to be &#8220;ORed&#8221;, it would require the use of the [OR] flag.</p>
<p>Enough about redirects. Let&#8217;s move on to lookup tables and RewriteMap, a directive that functions within your server config file (not .htaccess). Let&#8217;s say you&#8217;d like to rewrite URLs that contain ID numbers to URLs that contain keywords. A laudable goal. Now let&#8217;s say you don&#8217;t have the lookup table in your database. You could reference a flat file &#8212; in text, or in DBM format (which is faster) &#8212; containing your mappings of ID numbers to keywords using RewriteMap, then base your RewriteRule on data found in that flat file. Here&#8217;s a hypothetical lookup table:</p>
<p><code>canon-g10-digital-camera /get_product.php?id=1001&amp;blah=blah<br />
128-gig-ipod-classic /get_product.php?id=1002&amp;blah=blah</code></p>
<p>And here&#8217;s what the corresponding RewriteMap and RewriteRule directives might look like:</p>
<p><code>RewriteMap prodmap txt:/home/someusername/prodmap.txt<br />
RewriteRule ^/products/(.+)\.html$ ${prodmap:$1} [L]</code></p>
<p>Conversely, you&#8217;ll want to 301 all of the legacy ID-containing URLs to the new keyword-containing ones. Like so:</p>
<p><code>RewriteMap prodmap2 txt:/home/someusername/prodmap2.txt<br />
RewriteCond %{QUERY_STRING} id=([0-9]+)<br />
RewriteCond %{QUERY_STRING} !blah=blah<br />
RewriteRule ^get_product\.php$ ${prodmap2:%1}? [L,R=301]</code></p>
<p>The corresponding lookup table for the above (&#8221;prodmap2.txt&#8221;) would look something like:</p>
<p><code>1001 /products/canon-g10-digital-camera.html<br />
1002 /products/128-gig-ipod-classic.html</code></p>
<p>You aren&#8217;t restricted to text or DBM files. You could alternatively install a script that looks up what the rewrite rule had captured into memory (between the parentheses) and then delivers back to the rewriting engine the corresponding destination. Here&#8217;s the slightly modified RewriteMap for such an instance:</p>
<p><code>RewriteMap prodmap prg:/home/someusername/mapscript.pl<br />
RewriteRule ^/products/(.+)\.html$ ${prodmap:$1} [L]</code></p>
<p>On to a different problem. Let&#8217;s say you wanted to rewrite to a URL located on another server. You can do it with the [P] flag. The &#8220;P&#8221; stands for &#8220;proxy&#8221;. For example, you could proxy the Google home page on your server (sans images) with the following rewrite rule:</p>
<p><code>RewriteRule  /google\.html$  http://www.google.com/ [P,L]</code></p>
<p>Without the [P] flag, the rewrite rule above would behave like a redirect.</p>
<p>You might be wondering how to accomplish all this wizardry if your server is running Microsoft IIS Server instead of Apache. As I mentioned in Part 1, the rules don&#8217;t differ greatly between mod_rewrite and ISAPI_Rewrite. For instance, instead of initializing things with &#8220;RewriteEngine on&#8221;, you would specify &#8220;[ISAPI_Rewrite]&#8221; on the first line of the httpd.ini file. Instead of [R=301], you would use [RP] to issue a 301. Instead of [NC] for case insensitivity, you would use [I]. And so on. The easiest way to convey this is through some illustrative examples:</p>
<p><code>#Capitalization and IIS' case insensitivity with regard to URLs<br />
RewriteRule (.*) http://www.example.com$1 [I,RP,L]</p>
<p>#Non-www and typo domains<br />
RewriteCond Host: (?!www\.example\.com)<br />
RewriteRule (.*) http://www.example.com$1 [I,RP,L]</p>
<p>#Drop the "default"<br />
RewriteRule (.*)/default.htm $1/ [I,RP,L]</p>
<p>#Add trailing slash if it's missing<br />
RewriteCond Host: (.*)<br />
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,RP,L]</code></p>
<p>At the start of this article, I promised we&#8217;d revisit my reasoning as to why rewrite rules are my preferred method of redirecting. It&#8217;s simply because RewriteRule is so darned powerful and flexible in comparison to Redirect, RedirectPermanent and RedirectMatch. Even though RedirectMatch supports regular expressions, it doesn&#8217;t offer nearly as comprehensive of a feature set as RewriteRule and RewriteCond. However, you may be on a web host or server that doesn&#8217;t have mod_rewrite installed/enabled. If that&#8217;s the case, it may be helpful to see a few examples of these alternative directives, which can be used in either .htaccess or httpd.conf:</p>
<p><code># 301 an individual URL<br />
Redirect 301 /old_url.htm http://www.example.com/new_url.htm</p>
<p># 301 the contents of a directory<br />
Redirect 301 /old_dir/ http://www.example.com/new_dir/<br />
# 301 an entire domain<br />
Redirect 301 / http://www.example.com</p>
<p># drop the index.html off the end of subdirectories and 301<br />
RedirectMatch 301 ^/(.+)/index\.html$ http://www.example.com/$1/</code></p>
<p>That&#8217;s all I&#8217;ve got &#8212; for now. I promise in my next article I won&#8217;t geek out so much. If you stayed with me this whole time, you deserve a cookie. Hit me up at the next conference and I&#8217;ll swipe one for you out of the speaker lounge.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/url-rewrites-and-redirects-part2-16575/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>URL Rewrites &amp; Redirects: The Gory Details (Part 1 of 2)</title>
		<link>http://searchengineland.com/url-rewrites-and-redirects-part1-16574</link>
		<comments>http://searchengineland.com/url-rewrites-and-redirects-part1-16574#comments</comments>
		<pubDate>Thu, 19 Feb 2009 10:00:03 +0000</pubDate>
		<dc:creator>Stephan Spencer</dc:creator>
				<category><![CDATA[100% Organic]]></category>
		<category><![CDATA[How To: SEO]]></category>

		<guid isPermaLink="false">http://searchengineland.com/?p=16574</guid>
		<description><![CDATA[If you&#8217;re dealing with a large complex website, rewriting your URLs from dynamic to static and placing all the necessary 301 redirects in place is &#8211; as programmers would say, nontrivial. The devil is in the details. Granted, the redirecting piece may not be quite as onerous anymore thanks to the advent of the search [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;"><a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsearchengineland.com%2Furl-rewrites-and-redirects-part1-16574"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsearchengineland.com%2Furl-rewrites-and-redirects-part1-16574" height="61" width="51" /></a></div><p>If you&#8217;re dealing with a large complex website, rewriting your URLs from dynamic to static and placing all the necessary 301 redirects in place is &#8211; as programmers would say, nontrivial. The devil is in the details. Granted, the redirecting piece may not be quite as onerous anymore thanks to the advent of the search engines&#8217; new <a href="http://searchengineland.com/canonical-tag-16537">canonical tag</a>, but you still have to worry about the user experience for users coming in to the obsolete URLs (through bookmarks, through old links, etc.). Therefore, ideally you&#8217;ll still want the 301s in place. Regular expressions and mod_rewrite (or alternatively, ISAPI_Rewrite for IIS) to the rescue!</p>
<p>Last week, when I presented on the 301 Redirect panel at SMX West, I gave folks a view &#8220;under the hood&#8221; at accomplishing redirects using rewrite rules and regular expressions &#8211; complete with the code necessary to pull it off. It wasn&#8217;t for the faint of heart. (My Powerpoint can be downloaded <a href="http://www.netconcepts.com/learn/301-redirect.ppt">here</a>.)</p>
<p>Regular expressions are so complex there are entire books dedicated to the topic (such as the excellent <a href="http://oreilly.com/catalog/9780596528126/">Mastering Regular Expressions, 3rd Ed.</a> by Jeffrey E. F. Friedl and published by my long-time favorite book publisher, O&#8217;Reilly). Before we delve into the use of regular expressions in rewrite rules, however, let&#8217;s step back and look at the URL rewriting process.</p>
<p><strong>The three types of URL rewrites</strong></p>
<p>Rewriting of search engine sub-optimal URLs can be accomplished through three approaches. The first of which &#8211; using a &#8220;URL rewriting&#8221; server module/plugin such as <a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html">mod_rewrite</a> for Apache or <a href="http://www.isapirewrite.com">ISAPI_Rewrite</a> for Microsoft IIS Server &#8211; is the most popular. If you can&#8217;t use a URL rewriting module on your server, you might recode your scripts to extract variables out of the &#8220;path_info&#8221; part of the URL instead of the &#8220;query_string&#8221;. An example of this might look like http://www.example.com/index.php/category/widgets.</p>
<p>With either approach, you&#8217;d want to replace all occurrences of your old URLs in links on your site with your new search-friendly URLs. Additionally, you may wish to 301 redirect the old URLs to the new ones, but this is apparently optional with the advent of the canonical tag. The third approach would be to use a proxy server based solution (e.g. <a href="http://www.netconcepts.com/services/search-marketing/gravitystream/">GravityStream</a>) that eliminates the need to recode your site or re-architect your CMS/e-commerce platform. This can be useful when IT department involvement with SEO projects must be minimized, for whatever reason.</p>
<p>Let&#8217;s assume you&#8217;re going with first approach &#8211; utilizing a rewriting module. If you are running Apache as your web server, you would place &#8220;rules&#8221; within your .htaccess file or your Apache configuration file (e.g. httpd.conf or the site-specific config file in the sites_conf directory). Similarly, if you are running IIS Server, you&#8217;d use an ISAPI plugin such as ISAPI_Rewrite and place rules in an httpd.ini config file. Note that rules can differ slightly on ISAPI_Rewrite compared to mod_rewrite. For Apache and mod_rewrite, your .htaccess would start off with:</p>
<p><code>RewriteEngine on<br />
RewriteBase /</code></p>
<p>Note that you should omit the second line above if adding the rewrites to your server config file, since RewriteBase is not supported there, only in .htaccess. We&#8217;re using RewriteBase above so that we won’t have to have &#8220;^/&#8221; at the beginning of all the rules, just &#8220;^&#8221;.</p>
<p>After this comes the rewrite rules. Let&#8217;s say we wanted to have requests for product page URLs of the format http://www.example.com/products/123 to display the content found at http://www.example.com/get_product.php?id=123, without the URL changing in the Location bar of the user&#8217;s browser and without you having to recode the get_product.php script. (Of course this doesn&#8217;t replace all occurrences of dynamic URLs within the links contained on all the site pages; that&#8217;s a separate issue.) Accomplishing this can be done with a single rewrite rule, like so:</p>
<p><code>RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1 [L]</code></p>
<p>In the above example, ^ signifies the start of the URL following the domain, $ signifies the end of the URL, [0-9] signifies a digit and the + immediately following it means one or more occurrences of a digit. Similarly, the ? immediately following the / means zero or one occurrences of a slash character. The () puts whatever is wrapped within it into memory. You can then access what&#8217;s been stored in memory with $1 (i.e. what is in the first set of parentheses). Not surprisingly, if you included a second set of parentheses in the rule, you&#8217;d access that with $2. And so on. The [L] flag saves on server processing by telling the rewrite engine to stop if it matched on that rule. Otherwise all the remaining rules will be run as well.</p>
<p>Sound complicated? You ain&#8217;t seen nothin&#8217; yet! Here&#8217;s a slightly more complex example, where URLs of the format http://www.example.com/webapp/wcs/stores/servlet/ProductDisplay?storeId=10001&amp;catalogId=10001&amp;langId=-1&amp;categoryID=4&amp;productID=123 would be rewritten to http://www.example.com/4/123.htm:</p>
<p><code>RewriteRule ^([^/]+)/([^/]+)\.htm$ /webapp/wcs/stores/servlet/ProductDisplay?storeId=10001&amp;catalogId=10001&amp;langId=-1&amp;categoryID=$1&amp;productID=$2 [QSA,L] </code></p>
<p>The [^/] signifies any character other than a slash. That&#8217;s because, within square brackets, ^ is interpreted as &#8220;not&#8221;. The [QSA] flag above is for when you don&#8217;t want the query string dropped (like when you want a tracking parameter preserved).</p>
<p>To write good rewrite rules you will need to become a master of &#8220;pattern matching&#8221; (which is simply another way to describe the use of regular expressions). Let&#8217;s look at some of the most important special characters and how they are interpreted by the rewrite engine:</p>
<blockquote><p>* means 0 or more of the immediately preceding character
+ means 1 or more of the immediately preceding character
? means 0 or 1 occurrence of the immediately preceding character
^ means the beginning of the string
$ means the end of it
. means any character (i.e. wildcard)
\ &#8220;escapes&#8221; the character that follows, e.g. \. means dot
[ ] is for character ranges, e.g. [A-Za-z] for any lower or upper case letter
^ inside [] brackets means &#8220;not&#8221;, e.g. [^/] means not slash</p></blockquote>
<p>It&#8217;s incredibly easy to make errors in regular expressions. Some of the common gotchas that lead to unintentional sub-string matches include:</p>
<ul>
<li>using .* when you should be using .+ since .* can match on nothing</li>
<li>not &#8220;escaping&#8221; with a backslash special characters that you don&#8217;t want interpreted, like when you specify . instead of \. and you really meant a dot rather than any character. (thus default.htm would match on defaultshtm)</li>
<li>omitting ^ or $ on the assumption that the start or end is implied (thus default\.htm would match on mydefault.html whereas ^default\.htm$ would only match on default.htm)</li>
<li>using &#8220;greedy&#8221; expressions that will match on all occurrences rather than stopping at the first occurrence.</li>
</ul>
<p>What do I mean by &#8220;greedy&#8221;? The easiest way to explain it is to show you an example. Let me illustrate:</p>
<p><code>RewriteRule ^(.*)/?index\.html$ /$1/ [L,R=301]</code></p>
<p>will redirect requests for http://www.example.com/blah/index.html to http://www.example.com/blah//. Probably not what was intended. Why did this happen? Because .* will capture the slash character within it before the /? gets to see it. Thankfully, there&#8217;s an easy fix. Simply use [^ or .*? instead of .* to do your matching. For example, use ^(.*?)/? instead of ^(.*)/? or [^/]+/[^/] instead of .*/.*</p>
<p>So, to correct the above rule you could use the following:</p>
<p><code>RewriteRule ^(.*?)/?index\.html$ /$1/ [L,R=301]</code></p>
<p>Why wouldn&#8217;t you use the following?</p>
<p><code>RewriteRule ^([^/]*)/?index\.html$ /$1/ [L,R=301]</code></p>
<p>Because it would only match on URLs with one directory. URLs containing multiple subdirectories such as http://www.example.com/store/cheese/swiss/wheel would not match.</p>
<p>And why wouldn&#8217;t you use the following?</p>
<p><code>RewriteRule ^(.*)index\.html$ $1/ [L,R=301]</code></p>
<p>Because it would match on http://www.example.com/myindex.html as well (since it&#8217;s not specified that the character immediately preceding index must be a slash).</p>
<p>Does your head hurt yet? As you might imagine, testing/debugging is a big part of URL rewriting. When debugging, the RewriteLog and RewriteLogLevel directives are your friend! Set the RewriteLogLevel to 4 or more to start seeing what the rewrite engine is up to when it interprets your rules.</p>
<p>By the way, the [R=301] flag in the last few examples above &#8212; as you might guess &#8212; tells the rewrite engine to do a 301 redirect instead of a standard rewrite.</p>
<p>Stay tuned for my column next month for Part 2, where I cover the all-important RewriteCond directive, lookup tables, ISAPI rewrite rules, proxying, alternatives to RewriteRule for redirecting, and more.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchengineland.com/url-rewrites-and-redirects-part1-16574/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
