The Long Road To The Debate Over “White Hat Cloaking”

I really hate arguments over cloaking. Like really hate them. But Rand Fishkin recently did a chart outlining the degrees of "safeness" to cloaking that in turn riled up Google’s Matt Cutts in comments there. No doubt others at search engines also dislike the idea that any cloaking is "white hat." So I wanted to revisit some of the things that Rand outlined about cloaking plus the guidelines Google updated last month. Over the years, content delivery methods that were once considered cloaking have become acceptable. This is a look at what those are and how we need a new name for them.

How Does Google Define Cloaking?

First, if you want to understand how many tiring debates we’ve had over cloaking issues, I strongly urge you to read my YADAC: Yet Another Debate About Cloaking Happens Again post from last year. One of the things it discusses is how Google dropped its definition of what cloaking was in 2006. Before then, it was:

The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site. In other words, the webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they’ll find when they click on a search result. To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking.

We were left with this on another page:

Make pages primarily for users, not for search engines. Don’t deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."

Then around June of last year, as best I can tell, we got this:

Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.

Some examples of cloaking include:

  • Serving a page of HTML text to search engines, while showing a page of images or Flash to users.
  • Serving different content to search engines than to users.

If your site contains elements that aren’t crawlable by search engines (such as Flash, Javascript, or images), you shouldn’t provide cloaked content to search engines. Rather, you should consider visitors to your site who are unable to view these elements as well. For instance:

  • Provide alt text that describes images for visitors with screen readers or images turned off in their browsers.
  • Provide the textual contents of Javascript in a noscript tag.

Ensure that you provide the same content in both elements (for instance, provide the same text in the Javascript as in the noscript tag). Including substantially different content in the alternate element may cause Google to take action on the site.

Then in June of this year, Google did a major blog post on the topic, which along with cloaking covered topics such as:

  • Geolocation
  • IP delivery
  • First Click Free

Much of what’s in the blog post has yet to migrate to the help pages, though I’m sure it will. But the more important point to me is that over time, things that people might have once considered cloaking have become acceptable to Google — and get named something other than cloaking in the process.

Renaming Confusion

Let’s take geolocation as an example of that. Years ago when Google would campaign against cloaking, those who disagreed would argue that Google itself cloaked. For example, go to Google.com from outside the US and you’ll see a different page than someone in the US sees. Cloaking! No, Google eventually argued — that’s not cloaking. It’s geolocation and perfectly fine for anyone to do.

OK, how about showing Google content that human users only see if they register with a site and/or pay for access. Cloaking! Well, here Google took some time to clarify things for web search. Google informally said that users shouldn’t have to pay to see content if they came from Google. In Google News search, this is called First Click Free, and it’s a formalized process for news publishers to follow (that’s why you can read the Wall Street Journal for free via Google News). But for web search, we never got a formal program or formal mention that this was allowed until the June 2008 blog post.

Along the way, we’ve also had more advice about doing things to make Flash content visible through JavaScript detection and replacement (see today’s excellent post, Google Now Crawling And Indexing Flash Content, from Vanessa Fox on those methods plus the new Flash reading that Google’s doing). We’ve also had advice about rewriting URLs to make them more search engine friendly — good advice – but also a situation that in some cases can be URL cloaking. We’ve also had situations where the search engines have absolutely known someone was cloaking — no ifs, ands or buts — but decided the intent wasn’t bad so let it go through (which is fine for me — I care more about intent than kneejerk reaction to a bad "tactic").

It’s confusing. Very confusing. Now what was cloaking again?

Charting Acceptability

To help, Rand put out a tactics chart with things rated from "pearly white" to "solid black." That’s nice, but it’s also debatable. In fact, it IS being debated! So I wanted to go back to what Google itself is saying and work from that.

The most important thing is that Rand takes a "tactics" approach — are you using IP detection, cookie detection, and so on — then suggests that as you add things like user agent detection or other tactics, you tip into the danger zone.

In contrast, Google is less about how you physically deliver content and more focused on the user experience — the end result of what a user sees. The bottom line in that remains quite simple — Google generally wants to be treated the same as a "typical" user. So if you typically use cookie detection to deliver content, Google’s got no real issue with that — as long as it sees what a typical user might see who doesn’t accept cookies.

Below, some content delivery methods now acceptable to Google that might have once been considered cloaking:

Content Delivery Method Description How Done OK With Google If…
Geolocation Users see content tailored to their physical location IP detection, cookies, user login Don’t do anything special just for Googlebot
First Click Free Users clicking from Google to a listed page can read page without having to pay or register with the hosting site [if they try to click past that page, it's then OK to toss up a barrier] IP detection, cookies, user login You let Googlebot through as if it were a paid/registered member and also allow anyone coming from Google’s search listings through
JavaScript Replacement Using JavaScript to show content to non-JavaScript capable visitors (such as Google) that matches the textual information within a Flash or other multimedia element JavaScript Don’t do anything special just for Googlebot [it sees what any non-JavaScript person would see]
Landing Page Testing Using tools like Google Website Optimizer to change pages that are shown JavaScript, other Don’t do anything special just for Googlebot [it sees what any non-JavaScript person would see]
URL Rewriting Stripping out unnecessary parameters and other URL "garbage" not needed to deliver a page Server side, JavaScript insertion The underlying content isn’t changing & you aren’t just detecting Googlebot to do it

Where am I getting the info for the chart above?

  • Geolocation & First Click Free come from Google Webmaster Central’s June 2008 blog post. The post talks about Google’s First Click Free as if it is only for news content. That’s not correct. You can use this method for Google web search, as was covered by Google reps speaking at SMX Advanced 2008 last month.
     
  • JavaScript Replacement: Google’s page on cloaking addresses this, plus reps have covered this advice in many public forums.
     
  • Landing Page Testing: Google Website Optimizer Now Available, But Is It Cloaking? covers the issues here, and How does Website Optimizer fit in with Google’s view of cloaking? from Google AdWords help has detailed advice on how it is acceptable (and when it is not)
  • URL Rewriting: Good Cloaking, Evil Cloaking & Detection from Stephan Spencer last year talked about the ways to strip down URLs so they appear "nicer" in search results. He’d surveyed the major search engines that emphatically told him this was OK, even if their spiders were detected in order to do it. I believe this is still fine as long as the content itself isn’t changing.

And what’s not OK?

Content Delivery Method Description How Done
Cloaking Showing Google content that typical users do not see IP, User Agent Detection
Conditional Redirects Showing Google a redirect code (301) that is different from what someone else sees IP, User Agent Detection

  • Cloaking comes from Google Webmaster Central’s June 2008 blog post and its help page on the topic. If you’re delivering content to users that’s different than what Google sees — and it’s not listed on the first chart – you’re probably cloaking.
     
  • Conditional Redirects: AKA, cloaking redirects. Discussed at SMX Advanced 2008, especially how Amazon has been doing it. Google warned that conditional redirects might be risky.

"Whitehat" Cloaking

I said I’m pretty tired over the cloaking debate, right? That’s because way back when the debate started, we had search engines that DID allow cloaking via XML feeds or other methods – or they turned blind eyes to cloaking they knew about — or they had programs like Google Scholar that sure felt like cloaking before guidelines caught up to say no, those aren’t cloaking.

These exceptions made it difficult to say cloaking was "bad" when clearly some of it was allowed. That’s why I urged the search engines — Google in particular — to update the guidelines to say that any type of "unapproved" cloaking might get you banned. My thought was that we could maybe skip past the debate over tactics (and finger-pointing — ah-ha!!!! — this major site is cloaking, ban them!) and focus more on the user experience.

Things have changed since then. The guidelines, as I’ve explained, have gotten more detailed — and exceptions have been spun off into new names. As part of this, the idea of "white hat cloaking" has come out, the "good" cloaking that’s now acceptable.

I can tell you firsthand that Google doesn’t like the phrase "white hat cloaking" at all. To Google, there’s cloaking — it’s always bad — and there are the other content delivery methods I’ve outlined above.

OK, I’ll roll with that. Personally, I’ll avoid talking about "white hat" or "good" cloaking if it helps improve relations between webmasters and the search engines AND helps newbies avoid getting into trouble. But I do think we need a term to encompass content delivery methods that do target spiders or consider them in some way.

First Click Free, JavaScript replacement, even geolocation — while maybe you don’t do something special for the search engines as part of them – you’re still considering them. Indeed, part of what you consider is to ensure that you might NOT do something special.

If cloaking is a bad word to the search engines that can never be redeemed, we still need a name for the "good" things that are out there. I’ve got my thinking cap on, and if you’ve got ideas, let me know by commenting.

I’ll close by saying that my list above is not complete, in terms of the various content delivery methods that are out there. I hope to grow this over time — and you can help with your comments, as well.

Related Topics: Channel: SEO | Google: SEO | SEO: Cloaking & Doorway Pages

Sponsored


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide