The Long Road To The Debate Over “White Hat Cloaking”

I really hate arguments over cloaking. Like really hate them. But Rand Fishkin recently did a chart outlining the degrees of "safeness" to cloaking that in turn riled up Google’s Matt Cutts in comments there. No doubt others at search engines also dislike the idea that any cloaking is "white hat." So I wanted to […]

Chat with SearchBot

I really hate arguments over
cloaking. Like
really hate them. But Rand Fishkin recently

did a chart
outlining the degrees of "safeness" to cloaking that in turn
riled up Google’s Matt Cutts in comments there. No doubt others at search
engines also dislike the idea that any cloaking is "white hat." So I wanted
to revisit some of the things that Rand outlined about cloaking plus the
guidelines Google updated last month. Over the years, content delivery
methods that were once considered cloaking have become acceptable. This is a
look at what those are and how we need a new name for them.

How Does Google Define Cloaking?

First, if you want to understand how many tiring debates we’ve had over
cloaking issues, I strongly urge you to read my
YADAC: Yet Another
Debate About Cloaking Happens Again
post from last year. One of the
things it discusses is how Google dropped its definition of what cloaking
was in 2006. Before then, it was:

The term "cloaking" is used to describe a website that returns altered
webpages to search engines crawling the site. In other words, the
webserver is programmed to return different content to Google than it
returns to regular users, usually in an attempt to distort search engine
rankings. This can mislead users about what they’ll find when they click
on a search result. To preserve the accuracy and quality of our search
results, Google may permanently ban from our index any sites or site
authors that engage in cloaking.

We were left with

this
on another page:

Make pages primarily for users, not for search engines. Don’t deceive
your users or present different content to search engines than you display
to users, which is commonly referred to as "cloaking."

Then around June of last year,

as best I can tell
, we got this:

Cloaking refers to the practice of presenting different content or URLs
to users and search engines. Serving up different results based on user
agent may cause your site to be perceived as deceptive and removed from
the Google index.

Some examples of cloaking include:

  • Serving a page of HTML text to search engines, while showing a page
    of images or Flash to users.
  • Serving different content to search engines than to users.

If your site contains elements that aren’t crawlable by search engines
(such as Flash, Javascript, or images), you shouldn’t provide cloaked
content to search engines. Rather, you should consider visitors to your
site who are unable to view these elements as well. For instance:

  • Provide alt text that describes images for visitors with screen
    readers or images turned off in their browsers.
  • Provide the textual contents of Javascript in a noscript tag.

Ensure that you provide the same content in both elements (for
instance, provide the same text in the Javascript as in the noscript tag).
Including substantially different content in the alternate element may
cause Google to take action on the site.

Then in June of this year, Google did a

major blog post
on the topic, which along with cloaking covered topics
such as:

  • Geolocation
  • IP delivery
  • First Click Free

Much of what’s in the blog post has yet to migrate to the help pages,
though I’m sure it will. But the more important point to me is that over
time, things that people might have once considered cloaking have become
acceptable to Google — and get named something other than cloaking in the
process.

Renaming Confusion

Let’s take geolocation as an example of that. Years ago when Google would
campaign against cloaking, those who disagreed would argue that Google
itself cloaked. For example, go to Google.com from outside the US and
you’ll see a
different page than someone in the US sees. Cloaking! No, Google eventually
argued — that’s not cloaking. It’s geolocation and perfectly fine for
anyone to do.

OK, how about showing Google content that human users only see if they
register with a site and/or pay for access. Cloaking! Well, here Google took
some time to clarify things for web search. Google informally said that
users shouldn’t have to pay to see content if they came from Google. In
Google News search, this is called

First Click Free
, and it’s a formalized process for news publishers to
follow (that’s why you can
read the Wall Street Journal for free
via Google News). But for web
search, we never got a formal program or formal mention that this was allowed
until the June 2008 blog post.

Along the way, we’ve also had more advice about doing things to make
Flash content visible through JavaScript detection and replacement (see
today’s excellent post,
Google Now Crawling
And Indexing Flash Content
, from Vanessa Fox on those methods plus the
new Flash reading that Google’s doing). We’ve also had advice about
rewriting URLs to make them more search engine friendly — good advice —
but also a situation that in some cases can be URL cloaking. We’ve also had
situations where the search engines have absolutely known someone was
cloaking — no ifs, ands or buts — but decided the intent wasn’t bad so let
it go through (which is fine for me — I care more about intent than
kneejerk reaction to a bad "tactic").

It’s confusing. Very confusing. Now what was cloaking again?

Charting Acceptability

To help, Rand put out a tactics chart with things rated from "pearly
white" to "solid black." That’s nice, but it’s also debatable. In fact, it
IS being debated! So I wanted to go back to what Google itself is saying and
work from that.

The most important thing is that Rand takes a "tactics" approach — are
you using IP detection, cookie detection, and so on — then suggests that as
you add things like user agent detection or other tactics, you tip into the
danger zone.

In contrast, Google is less about how you physically deliver content and
more focused on the user experience — the end result of what a user sees.
The bottom line in that remains quite simple — Google generally wants to be
treated the same as a "typical" user. So if you typically use cookie
detection to deliver content, Google’s got no real issue with that — as
long as it sees what a typical user might see who doesn’t accept cookies.

Below, some content delivery methods now acceptable to Google that might
have once been considered cloaking:

Content Delivery Method Description How Done OK With Google If…
Geolocation Users see content tailored to their
physical location
IP detection, cookies, user login Don’t do anything special just for
Googlebot
First Click Free Users clicking from Google to a listed
page can read page without having to pay or register with the hosting
site [if they try to click past that page, it’s then OK to toss up a
barrier]
IP detection, cookies, user login You let Googlebot through as if it
were a paid/registered member and also allow anyone coming from Google’s
search listings through
JavaScript Replacement Using JavaScript to show content to
non-JavaScript capable visitors (such as Google) that matches the
textual information within a Flash or other multimedia element
JavaScript Don’t do anything special just for
Googlebot [it sees what any non-JavaScript person would see]
Landing Page Testing Using tools like Google Website
Optimizer to change pages that are shown
JavaScript, other Don’t do anything special just for
Googlebot [it sees what any non-JavaScript person would see]
URL Rewriting Stripping out unnecessary parameters
and other URL "garbage" not needed to deliver a page
Server side, JavaScript insertion The underlying content isn’t changing
& you aren’t just detecting Googlebot to do it

Where am I getting the info for the chart above?

  • Geolocation & First Click Free come from Google
    Webmaster Central’s June 2008

    blog post
    . The post talks about Google’s First Click Free as if it is
    only for news content. That’s not correct. You can use this method for
    Google web search, as was covered by Google reps speaking at SMX Advanced
    2008 last month.
     
  • JavaScript Replacement: Google’s

    page
    on cloaking addresses this, plus reps have covered this advice in
    many public forums.
     
  • Landing Page Testing:
    Google Website
    Optimizer Now Available, But Is It Cloaking?
    covers the issues here,
    and

    How does Website Optimizer fit in with Google’s view of cloaking?
    from
    Google AdWords help has detailed advice on how it is acceptable (and when
    it is not)
  • URL Rewriting:
    Good Cloaking,
    Evil Cloaking & Detection
    from Stephan Spencer last year talked about
    the ways to strip down URLs so they appear "nicer" in search results. He’d
    surveyed the major search engines that emphatically told him this was OK,
    even if their spiders were detected in order to do it. I believe this is
    still fine as long as the content itself isn’t changing.

And what’s not OK?

Content Delivery Method Description How Done
Cloaking Showing Google content that typical
users do not see
IP, User Agent Detection
Conditional Redirects Showing Google a redirect code (301)
that is different from what someone else sees
IP, User Agent Detection


  • Cloaking
    comes from Google Webmaster Central’s June 2008

    blog post
    and its help

    page
    on the topic. If you’re delivering content to users that’s
    different than what Google sees — and it’s not listed on the first chart
    — you’re probably cloaking.
     
  • Conditional Redirects: AKA, cloaking redirects. Discussed at
    SMX Advanced 2008, especially

    how Amazon
    has been doing it. Google
    warned that

    conditional redirects
    might be risky.

"Whitehat" Cloaking

I said I’m pretty tired over the cloaking debate, right? That’s because
way back when the debate started, we had search engines that DID allow
cloaking via XML feeds or other methods – or they turned blind eyes to
cloaking they knew about — or they had programs like Google Scholar that
sure felt like cloaking before guidelines caught up to say no, those aren’t
cloaking.

These exceptions made it difficult to say cloaking was "bad" when clearly
some of it was allowed. That’s why I urged the search engines — Google in
particular — to update the guidelines to say that any type of "unapproved"
cloaking might get you banned. My thought was that we could maybe skip past
the debate over tactics (and finger-pointing — ah-ha!!!! — this major site
is cloaking, ban them!) and focus more on the user experience.

Things have changed since then. The guidelines, as I’ve explained, have
gotten more detailed — and exceptions have been spun off into new names. As
part of this, the idea of "white hat cloaking" has come out, the "good"
cloaking that’s now acceptable.

I can tell you firsthand that Google doesn’t like the phrase "white hat
cloaking" at all. To Google, there’s cloaking — it’s always bad — and there are the other
content delivery methods I’ve outlined above.

OK, I’ll roll with that. Personally, I’ll avoid talking about "white hat"
or "good" cloaking if it helps improve relations between webmasters and
the search engines AND helps newbies avoid getting into trouble. But
I do think we need a term to encompass content delivery methods that do
target spiders or consider them in some way.

First Click Free, JavaScript replacement, even geolocation — while maybe
you don’t do something special for the search engines as part of them —
you’re still considering them. Indeed, part of what you consider is to ensure
that you might NOT do something special.

If cloaking is a bad word to the search engines that can never be
redeemed, we still need a name for the "good" things that are out there.
I’ve got my thinking cap on, and if you’ve got ideas, let me know by
commenting.

I’ll close by saying that my list above is not complete, in terms of the
various content delivery methods that are out there. I hope to grow this
over time — and you can help with your comments, as well.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Danny Sullivan
Contributor
Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land and MarTech, and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.

Get the must-read newsletter for search marketers.