Everything you need to know about SEO, delivered every Thursday.
Good Cloaking, Evil Cloaking & Detection
cloaking evil? It’s one of the most heavily debated topics in the SEO industry
— and people
can’t even agree on what defines cloaking. In this column, I wanted to look at
an example of what even the search engines might consider "good" cloaking, the
middle-ground territory that page testing introduces plus revisiting how to
detect when "evil" old-school page cloaking is happening.
Back in December 2005, the four major engines
went on record at Search Engine Strategies Chicago to define the line
between cloaking for good and for evil. From the audience, I asked the panelists
if it was acceptable to — selectively for spiders — replace search engine
unfriendly links (such as those with session IDs and superfluous parameters)
with search engine friendly versions. All four panelists responded "No problem."
Charles Martin from Google even jumped in again with an enthusiastic, "Please do
URL Rewriting? Not Cloaking!
My understanding is that their positions haven’t changed on this. Cloaking —
by its standard definition of serving up different content to your users than to
the search engines — is naughty and should be avoided. Cloaking where all
you’re doing is cleaning up spider-unfriendly URLs, well that’s A-OK. In fact,
Google engineers have told me in individual conversations that they don’t even
consider it to be cloaking.
Because search engines are happy to have you simplify your URLs for their
spiders — eliminating session IDs, user IDs, superfluous flags, stop characters
and so on — it may make sense to do that only for spiders and not for humans.
That could be because rewriting the URLs for everyone is too difficult, costly
or time intensive to implement. Or more likely, it could be that certain
functionality requires these parameters, but that functionality is not of any
use to a search engine spider — such as putting stuff in your shopping cart or
wish list or keeping track of your click path in order to customize the
Many web marketers like to track which link was clicked on when there are
multiple links to the same location contained on the page. They add tracking
tags to the URL, like "source=topnav" or "source=sidebar." The problem with that
is it creates duplicate pages for the search engine spiders to explore and
index. This leads to a dilution of link gain or PageRank, because all the votes
that you are passing on to that page are being split up because of the different
URLs you are using. Ouch.
How about instead you employ "good cloaking" and strip out those tracking
codes solely for spiders? Sounds like a good plan to me. Keep your
analytics-obsessed web marketers happy, and the search engines too.
I have to mention, you don’t have to cloak your pages to simplify your URLs
various tracking parameters to the URL upon the click. For example, REI.com used
to append a "vcat=" parameter on all brand links on their
URLs made it into Google.
Is Testing Bad Cloaking?
Is multivariate testing a form of bad cloaking? This is where services like
Offermatica or even Google’s own
show different users different versions of the same URL. That could be
considered cloaking, because human visitors and search engines are getting
different content. Spiders can’t participate in the test group, and thus the
content of that test is invisible to the spiders; that’s because of the
function on the user’s browser. Google engineers have told me that they want
Googlebot to be part of the test set. Therein lies the rub; the technology isn’t
built to support that.
Uncovering User Agent Based Cloaking
The "bad" cloaking from a search engine point of view is that deliberate
showing to a spider content that might be entirely different than what humans
see. Those doing this often try to cover their tracks by making it difficult to
examine the version meant only for spiders. They do this with a "noarchive"
command embedded within the meta tags. Googlebot and other major spiders will
obey that directive and not archive the page, which then causes the "Cached"
link in that page’s search listing to disappear.
So getting a view behind the curtain to see what is being served to the
spider can be a bit tricky. If the type of cloaking is solely user agent based,
you can use the User Agent
Switcher extension for Firefox. Just create a user-agent of:
under Tools > User Agent Switcher > Options > Options > User Agents in the
menu. Then switch to that user agent and have fun surfing as Googlebot in
Uncovering IP Based Cloaking
But hard-core cloakers are too clever for this trick. They’ll feed content to
a spider based on known IP addresses. Unless you’re within a search engine —
using one of these known IP addresses — you can’t see the cloaked page, if it
also has been hidden by being kept out of the search engine’s cache.
Actually, there’s still a chance. Sometimes
Google Translate can be used to view
the cloaked content, because many cloakers don’t bother to differentiate between
the spider coming in for the purpose of translating or coming in for the purpose
of crawling. Either way, it uses the same range of Google IP addresses. Thus,
when a cloaker is doing IP delivery they tend to serve up the Googlebot-only
version of the page to the Translate tool. This loophole can be plugged, but
many cloakers miss this.
And I bet you didn’t know that you can actually set the Translation language
to English even if the source document is in English! You simply set it in the
URL, like so:
In the code above, replace the bolded URLGOESHERE part with the actual URL of
the page you want to view. That way, when you are reviewing someone’s cloaked
page, you can see the page in English instead of having to see the page in a
foreign language. You can also sometimes use this trick to view paid content, if
you’re too cheap to pay for a subscription.
Many SEOs dismiss cloaking out-of-hand as an evil tactic, but in my mind,
there is a time and a place for it (the URL simplifying variety, not the content
differing variety), even if you are a pearly white hat SEO.
Stephan Spencer is founder and president of
Netconcepts, a 12-year-old web agency
specializing in search engine optimized ecommerce. He writes for several
publications plus blogs at
StephanSpencer.com and Natural
Search Blog. The
column appears Thursdays at Search Engine
Some opinions expressed in this article may be those of a guest author and not necessarily Search Engine Land. Staff authors are listed here.