Search Engine Land » Platforms » Google » Google News » YADAC: Yet Another Debate About Cloaking Happens Again

YADAC: Yet Another Debate About Cloaking Happens Again

Sigh. Double sigh. Triple sigh. I guess now that the SEO industry has had the required twice-yearly debate about the reputation of SEO, it’s time to do the go round about cloaking once again. A quick word about cloaking has Google’s Matt Cutts trying to clarify concerns that Philipp Lenssen of Google Blogoscoped has been […]

Danny Sullivan on March 4, 2007 at 11:16 pm | Reading time: 15 minutes

Sigh. Double sigh. Triple sigh. I guess now that the SEO industry has had the
required twice-yearly
debate about the reputation of SEO, it’s time to do the go round about
cloaking once again.
A quick
word about cloaking has Google’s Matt Cutts trying to clarify concerns that
Philipp Lenssen of Google Blogoscoped
has been raising about WebmasterWorld.
The comments are now up over 100, as people rehash things that have been hashed,
mashed, rebaked so many times before. Below, some cloaking history plus an
honest plea about trying to get past this stupid, stupid issue.

Definition Time

Let’s do the definition, first:

Cloaking is when you show a search engine content that is different than what
a human being sees.

Got it? That’s my definition, and Matt says virtually the same thing in his
post today:

Cloaking is serving different content to users than to search engines.

So simple. What’s to debate? Well, is it cloaking if…

A spider coming from a US IP address sees a different page than a user
from a UK IP address?
A spider sees content that a user sees, but only if they do free
registration
A spider sees content that a user sees, but only if they do paid
registration
A spider sees content in text that represents what a users sees in Flash
A spider sees content that’s slightly different than what a user sees when
their browser renders Javascript

Pick one of those above — pick something else (see our
Good Cloaking, Evil
Cloaking & Detection column from last week) — and people can, will and have
pointed at something someone is doing, then yelled "cloaking" and screamed for a ban
to happen. A ban? Well, as you know, all search engines hate cloaking. Actually,
that’s always been a confused point. Here starts the lesson.

History Time: Tactics Versus Intent

Back in January 2003, Alan Perkins wrote this big

Cloaking Is Always A Bad Idea article, telling us that search engines always
said cloaking was bad. I was never a proponent of cloaking. I was, however, well
aware that NOT all the guidelines were against cloaking. In addition, with paid
inclusion, I argued some cloaking was actually allowed. All this went into my
Ending The
Debate Over Cloaking that came out in reaction to Alan’s article, in
February 2003.

Since I knew that all the search engines had allowed some types of cloaking,
my advice to marketers was this, with the stress on avoiding "unapproved
cloaking:"

Cloaking is getting a search engine to record content for a URL that is
different than what a searcher will ultimately see, often intentionally. It can
be done in many technical ways. Several search engines have explicit bans
against unapproved cloaking, of which Google is the most notable one. Some
people cloak without approval and never have problems. Some even may cloak
accidentally. However, if you cloak intentionally without approval — and if you
deliver content to a search engine that is substantially different from what a
search engine records — then you stand a much larger chance of being penalized
by search engines with penalties against unapproved cloaking. If in doubt, ask
the search engine if it has a problem with what you intend to do, assuming you
can’t get a clear answer from written guidelines that are provided. If you are
working with a third-party search engine marketer, ask them for proof that what
they intend to do is approved. Otherwise, be prepared for any adverse
consequences.

The suggestion to avoid "unapproved cloaking"

infuriated Doug Heil over at the iHelpYou forums, who could not (and to this
day still cannot) get over the idea that cloaking MUST equal spamming.

My response back then remains the same today. There’s a difference between
tactics and intent. Many of the things that might cause penalties with
search engines are tactics (hidden text, gibberish pages, cloaking) that are
closely aligned with the intent of trying to mislead or game the search
algorithms. But in some cases, what’s a bad tactic (or technical implication)
might have a good intent as agreed by the search engines. So they’ll allow it,
either turning a blind eye to it or giving it some official endorsement.

That difference is important because back then, if the search engines got
behind the "unauthorized" versus "authorized" suggestion, we wouldn’t be having
today’s wasteful argument. But let’s carry on.

NPR, Google Scholar & Approved Cloaking

In May 2004, I
looked at
how National Public Radio was, in my view, cloaking text transcripts of audio to
search engines but only letting human visitors by those. At the time, Google had
a guideline against cloaking that read:

The term "cloaking" is used to describe a website that returns altered
webpages to search engines crawling the site. In other words, the webserver is
programmed to return different content to Google than it returns to regular
users, usually in an attempt to distort search engine rankings. This can mislead
users about what they’ll find when they click on a search result. To preserve
the accuracy and quality of our search results, Google may permanently ban from
our index any sites or site authors that engage in cloaking to distort their
search rankings.

I argued that this was an example of "good cloaking" and that the real issue
I had with it was that other marketers were supposedly banned from doing it:

As a searcher, I’m actually glad the method is being used. It does mean I’m
more likely to find audio content of interest. Moreover, I can listen to that
for free via the NPR site.

As a search engine marketer, I’m not so thrilled. I’m well aware that many
other companies would like the ability to feed Google content in this manner.
In addition, they have just as compelling arguments as NPR about having good
content that isn’t adequately indexed by the Google crawler. Unfortunately,
they’re denied the privilege of feeding relevant material just to Google’s
crawler.

What about Yahoo? Anyone can enjoy the same benefits that NPR has, the
ability to cloak content when relevant, through Yahoo’s content acquisition
program. Non-profit organizations are offered this for free. Commercial
organizations have to pay, making use of Yahoo’s trusted feed program.

By November 2004, I was writing about cloaking again. Now Google had an
officially approved program that, in my view, allowed cloaking. This was
Google Scholar. As I
wrote:

This system may lead to problems for some searchers. In the example above,
not only could I NOT read the paper, as I didn’t have a subscription, but I
also could not read even an abstract. Instead, a password-prompt continued to
appear, even when I cancelled it, making it extremely difficult to finally
close the window (and that’s why I haven’t linked to the actual paper, to save
other people the problem).

This situation is probably unusual, however. One of Google’s requirements
for inclusion in Google Scholar is that publishers at least show abstracts to
searchers.

The special access for publishers flies in the face of Google’s
anti-cloaking policy. Google is being shown material that regular users
wouldn’t normally see, its own definition of cloaking. This is a GOOD thing
for searchers, but the company needs to amend its cloaking policy so as not to
be hypocritical.

Indeed, that’s long overdue. This has been a problem since I first reported
about a similar issue earlier this year. A sidebar piece … looks at the
latest case and suggests some fixes for Google, including finally moving
forward with formalizing such programs for ALL publishers.

Plea For Google To Change The Cloaking Guidelines

In the sidebar to that article, and
again in a
follow-up piece a few days later, I urged Google to alter its cloaking policy to
something stressing that cloaking was bad only if not approved:

The term "cloaking" is used to describe a website that returns altered
webpages to search engines crawling the site without permission. In
other words, the webserver is programmed to return different content to Google
than it returns to regular users, usually in an attempt to distort search
engine rankings. This can mislead users about what they’ll find when they
click on a search result. To preserve the accuracy and quality of our search
results, Google may permanently ban from our index any sites or site authors
that engage in cloaking without our permission, if we feel it is harmful to
our search rankings.

BMW, WebmasterWorld & New York Times Cloaking Accusations

In March 2005, there was great amusement in some quarters when Google’s
policy against cloaking caused it to ban itself when pages apparently with text
designed to help internal Google searching made also onto external versions seen
only by Google spiders, rather than humans (see
here,
here and
here for more).

By the end of 2005, WebmasterWorld came under
accusations
of cloaking. Since it is one of the most important forums about search engines
around — frequented by official Google reps — it sort of became a poster child
of "why can they do it but others can’t" for some.

Far bigger news came in 2006. In February, BMW got

banned on Google for using hidden text — in particular, a "poor-man’s"
version of cloaking that used JavaScript to show different content to users. It
got back in a few days later. Then in June, an
article
about how the New York Times optimizes content for search engines sparked a new
cloaking debate when it seemed that the major search engines were allowing
cloaked content. In particular, the New York Times was allowing search spiders
to read content that was only accessible to humans if you registered for free
or, in some cases, paid for access.

Marshall
Simmonds, the NYTimes & Acceptable Cloaking was a giant discussion that came
out of that article, which had me originally arguing that this was cloaking,
since search spiders were seeing something different than most humans could see.
But I was convinced to change my mind. Since the spiders were indeed shown what
anyone could ultimately see, this wasn’t cloaking. I commented:

Until now, I would have considered feeding a search engine a page that people
couldn’t see unless they registered to have been unapproved cloaking, since most
users were seeing something different that then spider saw.

But sure, I’ll buy into the "eventually you’ll see the same thing" argument as
this not being cloaking. Why not? Google’s allowed this in approved cases for
about two years now and never wants to go on the record as this being approved
cloaking. So don’t call it cloaking and everyone’s happy. Google’s not allowing
something officially they say not to do, and content owners can do this without
fear.

In fact, I expect Fantomaster, Beyond Engineering and anyone with IP delivery
lists now can have some new-found respect from people who previously slammed
them as helping cloakers. Here they’ve been saying its all about content
delivery systems and now they’re right, at least in some situations. Because
after all, some people aren’t going to want to depend on user agent delivery to
feed content this way.

Of course, I’m still not going to do this yet with my own members-only content.
Despite agreeing with you, Phil — despite seeing others allowed to do this —
I’m still fearful Google might arbitrarily decide to call it cloaking anyway
when they choose. But maybe I’ll be braver down the line, and why not? It’s like
a whole new world.

I did have several off-the-record conversations with Google about this. The
main thing that came out that I can report was that Google really felt most
users should see what their spiders saw WITHOUT having to register or pay for
access. That’s similar to what Matt wrote about WebmasterWorld content today,
when covering the latest criticisms that it might be cloaked:

I consider the issue in a much better state now, in that most (all?) Google
searchers get the identical page to what Googlebot saw. But I still consider
Philipp’s February posts open for investigation, and I will get to them, in the
same way that I tackled Philipp’s first two posts about this.

FYI, to understand a bit more about the WebmasterWorld situation, see Matt’s
post from March 2006,
How
to sign up for WebmasterWorld. It explains how in some cases, trying to
access a thread that you might come across from a search result can trigger a
registration results. In today’s post at Matt’s, WebmasterWorld’s Brett Tabke
explains and
points at more information on how this has changed.

The comments in Matt’s post also get into an apparent return of some New York
Times content requiring registration, as well as more complaints about Google
Scholar content (added with the full cooperation of Google) also being annoying,
in that you can’t read it without paying when clicking from search results.

Solution 1: Allow Registered Content

Enough history. Time for some solutions. Both Google and Yahoo have programs
that allow people with free registration or fee-based content to show up in
search results — and I mean mixed in with regular search results, not
segregated out like the Yahoo
Subscriptions product launched in June 2005. Yahoo’s will happen mainly
through paid inclusion, and not that much. Google’s will happen primarily
through the aforementioned Google Scholar plus the

First Click Free system used for Google News but

which also may happen with some web search content.

None of this content is labeled in any way. Back to that discussion on
cloaking and the New York Times, I commented:

It is annoying to hit one of these links and not know that payment or
registration is required, however. That problem’s going to get worse as more
and more people decide this isn’t cloaking and give it a go. Google and the
others should look to establish a way for site owners to better flag premium
or registration only content….

I don’t agree having paid content in regular search results is bad. I have
a Wall Street Journal paid subscription. They have lots of great content. If
I’m doing a search, and they’ve got a good match, I want to know that.

And over at Yahoo, as I explained way way back above, I can do that. I can
choose specifically to have this content revealed to me. It doesn’t make my
results bad at all.

It is a bad user experience if you constantly get back results that you
can’t actually view, of course, without paying. We simply aren’t going to
subscribe to everything.

The solution is easy. Give users the option. Let me choose to see content
that requires payment or not. Or similarly, let me choose to see content that
might require free registration. We just need Google and the others to
graduate from 1999 mentality and better accommodate web sites with this type
of content. It’s easily done, if they want to do it. And getting more
formalized program for publishers, as well as options for searchers, will
help.

So enough already. Enough with the special programs that only some publishers
get to do. I want Google — which leads the charge in scaring people about
cloaking — to fast track a system to let anyone with registration-based or
fee-based content to be in their search results.

As for usability, either flag the URLs so users know to expect a charge or
registration request or make it possible for users to exclude this information.
Or experiment with both. But do something so that publishers don’t take matters
into their own hands and the SEO industry has to have yet another one of these
debates over whether it’s cloaking.

Solution 2: Allow For Approved Cloaking

Remember that suggested revision for Google’s cloaking policy I gave above?
Well sometime in 2006, Google dropped its definition of cloaking entirely.
Instead, we were just left with shorter definition of cloaking

here:

Make pages for users, not for search engines. Don’t deceive your users or
present different content to search engines than you display to users, which is
commonly referred to as "cloaking."

And a warning against it

here:

However, certain actions such as cloaking, writing text in such a way that it
can be seen by search engines but not by users, or setting up pages/links with
the sole purpose of fooling search engines may result in removal from our index.

That warning, like the old warning, uses the word "may" in terms of removal
for cloaking. Will cloaking get you banned? Maybe. Maybe if it is noticed. Maybe
if it is deemed harmful to the users. Maybe if after a closer look, it can’t be
pigeonholed in some other definition.

For me, it would be clearer to go back to the old definition and stress that
unless approved, cloaking might result in a ban. Roll that out along with a
program making it easier for people to feed in registered content. That gives
Google flexibility, helps publishers plus stops this insane focus on
technical/tactical implementations and refocuses concern where it belongs — the
user experience.

Did what you do help or harm the search results, in Google’s opinion? If you
were harming search results, Google’s always reserved the right to boot you out.
And if you were technically violating guidelines but not in a harmful way,
Google’s always reserved the right to turn a blind eye. Or rather, an approving
eye, an eye knowing that it’s intent that matters, not some technicality.

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.

Add Search Engine Land to your Google News feed.