The Long Road To The Debate Over “White Hat Cloaking”
I really hate arguments over cloaking. Like really hate them. But Rand Fishkin recently did a chart outlining the degrees of "safeness" to cloaking that in turn riled up Google’s Matt Cutts in comments there. No doubt others at search engines also dislike the idea that any cloaking is "white hat." So I wanted to […]
I really hate arguments over
cloaking. Like
really hate them. But Rand Fishkin recently
did a chart outlining the degrees of "safeness" to cloaking that in turn
riled up Google’s Matt Cutts in comments there. No doubt others at search
engines also dislike the idea that any cloaking is "white hat." So I wanted
to revisit some of the things that Rand outlined about cloaking plus the
guidelines Google updated last month. Over the years, content delivery
methods that were once considered cloaking have become acceptable. This is a
look at what those are and how we need a new name for them.
How Does Google Define Cloaking?
First, if you want to understand how many tiring debates we’ve had over
cloaking issues, I strongly urge you to read my
YADAC: Yet Another
Debate About Cloaking Happens Again post from last year. One of the
things it discusses is how Google dropped its definition of what cloaking
was in 2006. Before then, it was:
The term "cloaking" is used to describe a website that returns altered
webpages to search engines crawling the site. In other words, the
webserver is programmed to return different content to Google than it
returns to regular users, usually in an attempt to distort search engine
rankings. This can mislead users about what they’ll find when they click
on a search result. To preserve the accuracy and quality of our search
results, Google may permanently ban from our index any sites or site
authors that engage in cloaking.
We were left with
this on another page:
Make pages primarily for users, not for search engines. Don’t deceive
your users or present different content to search engines than you display
to users, which is commonly referred to as "cloaking."
Then around June of last year,
as best I can tell, we got this:
Cloaking refers to the practice of presenting different content or URLs
to users and search engines. Serving up different results based on user
agent may cause your site to be perceived as deceptive and removed from
the Google index.Some examples of cloaking include:
- Serving a page of HTML text to search engines, while showing a page
of images or Flash to users.- Serving different content to search engines than to users.
If your site contains elements that aren’t crawlable by search engines
(such as Flash, Javascript, or images), you shouldn’t provide cloaked
content to search engines. Rather, you should consider visitors to your
site who are unable to view these elements as well. For instance:
- Provide alt text that describes images for visitors with screen
readers or images turned off in their browsers.- Provide the textual contents of Javascript in a noscript tag.
Ensure that you provide the same content in both elements (for
instance, provide the same text in the Javascript as in the noscript tag).
Including substantially different content in the alternate element may
cause Google to take action on the site.
Then in June of this year, Google did a
major blog post on the topic, which along with cloaking covered topics
such as:
- Geolocation
- IP delivery
- First Click Free
Much of what’s in the blog post has yet to migrate to the help pages,
though I’m sure it will. But the more important point to me is that over
time, things that people might have once considered cloaking have become
acceptable to Google — and get named something other than cloaking in the
process.
Renaming Confusion
Let’s take geolocation as an example of that. Years ago when Google would
campaign against cloaking, those who disagreed would argue that Google
itself cloaked. For example, go to Google.com from outside the US and
you’ll see a
different page than someone in the US sees. Cloaking! No, Google eventually
argued — that’s not cloaking. It’s geolocation and perfectly fine for
anyone to do.
OK, how about showing Google content that human users only see if they
register with a site and/or pay for access. Cloaking! Well, here Google took
some time to clarify things for web search. Google informally said that
users shouldn’t have to pay to see content if they came from Google. In
Google News search, this is called
First Click Free, and it’s a formalized process for news publishers to
follow (that’s why you can
read the Wall Street Journal for free via Google News). But for web
search, we never got a formal program or formal mention that this was allowed
until the June 2008 blog post.
Along the way, we’ve also had more advice about doing things to make
Flash content visible through JavaScript detection and replacement (see
today’s excellent post,
Google Now Crawling
And Indexing Flash Content, from Vanessa Fox on those methods plus the
new Flash reading that Google’s doing). We’ve also had advice about
rewriting URLs to make them more search engine friendly — good advice —
but also a situation that in some cases can be URL cloaking. We’ve also had
situations where the search engines have absolutely known someone was
cloaking — no ifs, ands or buts — but decided the intent wasn’t bad so let
it go through (which is fine for me — I care more about intent than
kneejerk reaction to a bad "tactic").
It’s confusing. Very confusing. Now what was cloaking again?
Charting Acceptability
To help, Rand put out a tactics chart with things rated from "pearly
white" to "solid black." That’s nice, but it’s also debatable. In fact, it
IS being debated! So I wanted to go back to what Google itself is saying and
work from that.
The most important thing is that Rand takes a "tactics" approach — are
you using IP detection, cookie detection, and so on — then suggests that as
you add things like user agent detection or other tactics, you tip into the
danger zone.
In contrast, Google is less about how you physically deliver content and
more focused on the user experience — the end result of what a user sees.
The bottom line in that remains quite simple — Google generally wants to be
treated the same as a "typical" user. So if you typically use cookie
detection to deliver content, Google’s got no real issue with that — as
long as it sees what a typical user might see who doesn’t accept cookies.
Below, some content delivery methods now acceptable to Google that might
have once been considered cloaking:
Content Delivery Method | Description | How Done | OK With Google If… |
Geolocation | Users see content tailored to their physical location |
IP detection, cookies, user login | Don’t do anything special just for Googlebot |
First Click Free | Users clicking from Google to a listed page can read page without having to pay or register with the hosting site [if they try to click past that page, it’s then OK to toss up a barrier] |
IP detection, cookies, user login | You let Googlebot through as if it were a paid/registered member and also allow anyone coming from Google’s search listings through |
JavaScript Replacement | Using JavaScript to show content to non-JavaScript capable visitors (such as Google) that matches the textual information within a Flash or other multimedia element |
JavaScript | Don’t do anything special just for Googlebot [it sees what any non-JavaScript person would see] |
Landing Page Testing | Using tools like Google Website Optimizer to change pages that are shown |
JavaScript, other | Don’t do anything special just for Googlebot [it sees what any non-JavaScript person would see] |
URL Rewriting | Stripping out unnecessary parameters and other URL "garbage" not needed to deliver a page |
Server side, JavaScript insertion | The underlying content isn’t changing & you aren’t just detecting Googlebot to do it |
Where am I getting the info for the chart above?
- Geolocation & First Click Free come from Google
Webmaster Central’s June 2008
blog post. The post talks about Google’s First Click Free as if it is
only for news content. That’s not correct. You can use this method for
Google web search, as was covered by Google reps speaking at SMX Advanced
2008 last month.
- JavaScript Replacement: Google’s
page on cloaking addresses this, plus reps have covered this advice in
many public forums.
- Landing Page Testing:
Google Website
Optimizer Now Available, But Is It Cloaking? covers the issues here,
and
How does Website Optimizer fit in with Google’s view of cloaking? from
Google AdWords help has detailed advice on how it is acceptable (and when
it is not)
- URL Rewriting:
Good Cloaking,
Evil Cloaking & Detection from Stephan Spencer last year talked about
the ways to strip down URLs so they appear "nicer" in search results. He’d
surveyed the major search engines that emphatically told him this was OK,
even if their spiders were detected in order to do it. I believe this is
still fine as long as the content itself isn’t changing.
And what’s not OK?
Content Delivery Method | Description | How Done |
Cloaking | Showing Google content that typical users do not see |
IP, User Agent Detection |
Conditional Redirects | Showing Google a redirect code (301) that is different from what someone else sees |
IP, User Agent Detection |
Cloaking comes from Google Webmaster Central’s June 2008
blog post and its help
page on the topic. If you’re delivering content to users that’s
different than what Google sees — and it’s not listed on the first chart
— you’re probably cloaking.
- Conditional Redirects: AKA, cloaking redirects. Discussed at
SMX Advanced 2008, especially
how Amazon has been doing it. Google
warned that
conditional redirects might be risky.
"Whitehat" Cloaking
I said I’m pretty tired over the cloaking debate, right? That’s because
way back when the debate started, we had search engines that DID allow
cloaking via XML feeds or other methods – or they turned blind eyes to
cloaking they knew about — or they had programs like Google Scholar that
sure felt like cloaking before guidelines caught up to say no, those aren’t
cloaking.
These exceptions made it difficult to say cloaking was "bad" when clearly
some of it was allowed. That’s why I urged the search engines — Google in
particular — to update the guidelines to say that any type of "unapproved"
cloaking might get you banned. My thought was that we could maybe skip past
the debate over tactics (and finger-pointing — ah-ha!!!! — this major site
is cloaking, ban them!) and focus more on the user experience.
Things have changed since then. The guidelines, as I’ve explained, have
gotten more detailed — and exceptions have been spun off into new names. As
part of this, the idea of "white hat cloaking" has come out, the "good"
cloaking that’s now acceptable.
I can tell you firsthand that Google doesn’t like the phrase "white hat
cloaking" at all. To Google, there’s cloaking — it’s always bad — and there are the other
content delivery methods I’ve outlined above.
OK, I’ll roll with that. Personally, I’ll avoid talking about "white hat"
or "good" cloaking if it helps improve relations between webmasters and
the search engines AND helps newbies avoid getting into trouble. But
I do think we need a term to encompass content delivery methods that do
target spiders or consider them in some way.
First Click Free, JavaScript replacement, even geolocation — while maybe
you don’t do something special for the search engines as part of them —
you’re still considering them. Indeed, part of what you consider is to ensure
that you might NOT do something special.
If cloaking is a bad word to the search engines that can never be
redeemed, we still need a name for the "good" things that are out there.
I’ve got my thinking cap on, and if you’ve got ideas, let me know by
commenting.
I’ll close by saying that my list above is not complete, in terms of the
various content delivery methods that are out there. I hope to grow this
over time — and you can help with your comments, as well.
Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.
Related stories
New on Search Engine Land