« Official: Microsoft Buys Powerset | Main | SearchCap: The Day In Search, July 1, 2008 »
Jul. 1, 2008 at 3:13pm Eastern by Danny Sullivan
The Long Road To The Debate Over "White Hat Cloaking"
I really hate arguments over cloaking. Like really hate them. But Rand Fishkin recently did a chart outlining the degrees of "safeness" to cloaking that in turn riled up Google's Matt Cutts in comments there. No doubt others at search engines also dislike the idea that any cloaking is "white hat." So I wanted to revisit some of the things that Rand outlined about cloaking plus the guidelines Google updated last month. Over the years, content delivery methods that were once considered cloaking have become acceptable. This is a look at what those are and how we need a new name for them.
How Does Google Define Cloaking?
First, if you want to understand how many tiring debates we've had over cloaking issues, I strongly urge you to read my YADAC: Yet Another Debate About Cloaking Happens Again post from last year. One of the things it discusses is how Google dropped its definition of what cloaking was in 2006. Before then, it was:
The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site. In other words, the webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they'll find when they click on a search result. To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking.
We were left with this on another page:
Make pages primarily for users, not for search engines. Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."
Then around June of last year, as best I can tell, we got this:
Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.
Some examples of cloaking include:
- Serving a page of HTML text to search engines, while showing a page of images or Flash to users.
- Serving different content to search engines than to users.
If your site contains elements that aren't crawlable by search engines (such as Flash, Javascript, or images), you shouldn't provide cloaked content to search engines. Rather, you should consider visitors to your site who are unable to view these elements as well. For instance:
- Provide alt text that describes images for visitors with screen readers or images turned off in their browsers.
- Provide the textual contents of Javascript in a noscript tag.
Ensure that you provide the same content in both elements (for instance, provide the same text in the Javascript as in the noscript tag). Including substantially different content in the alternate element may cause Google to take action on the site.
Then in June of this year, Google did a major blog post on the topic, which along with cloaking covered topics such as:
- Geolocation
- IP delivery
- First Click Free
Much of what's in the blog post has yet to migrate to the help pages, though I'm sure it will. But the more important point to me is that over time, things that people might have once considered cloaking have become acceptable to Google -- and get named something other than cloaking in the process.
Renaming Confusion
Let's take geolocation as an example of that. Years ago when Google would campaign against cloaking, those who disagreed would argue that Google itself cloaked. For example, go to Google.com from outside the US and you'll see a different page than someone in the US sees. Cloaking! No, Google eventually argued -- that's not cloaking. It's geolocation and perfectly fine for anyone to do.
OK, how about showing Google content that human users only see if they register with a site and/or pay for access. Cloaking! Well, here Google took some time to clarify things for web search. Google informally said that users shouldn't have to pay to see content if they came from Google. In Google News search, this is called First Click Free, and it's a formalized process for news publishers to follow (that's why you can read the Wall Street Journal for free via Google News). But for web search, we never got a formal program or formal mention that this was allowed until the June 2008 blog post.
Along the way, we've also had more advice about doing things to make Flash content visible through JavaScript detection and replacement (see today's excellent post, Google Now Crawling And Indexing Flash Content, from Vanessa Fox on those methods plus the new Flash reading that Google's doing). We've also had advice about rewriting URLs to make them more search engine friendly -- good advice -- but also a situation that in some cases can be URL cloaking. We've also had situations where the search engines have absolutely known someone was cloaking -- no ifs, ands or buts -- but decided the intent wasn't bad so let it go through (which is fine for me -- I care more about intent than kneejerk reaction to a bad "tactic").
It's confusing. Very confusing. Now what was cloaking again?
Charting Acceptability
To help, Rand put out a tactics chart with things rated from "pearly white" to "solid black." That's nice, but it's also debatable. In fact, it IS being debated! So I wanted to go back to what Google itself is saying and work from that.
The most important thing is that Rand takes a "tactics" approach -- are you using IP detection, cookie detection, and so on -- then suggests that as you add things like user agent detection or other tactics, you tip into the danger zone.
In contrast, Google is less about how you physically deliver content and more focused on the user experience -- the end result of what a user sees. The bottom line in that remains quite simple -- Google generally wants to be treated the same as a "typical" user. So if you typically use cookie detection to deliver content, Google's got no real issue with that -- as long as it sees what a typical user might see who doesn't accept cookies.
Below, some content delivery methods now acceptable to Google that might have once been considered cloaking:
| Content Delivery Method | Description | How Done | OK With Google If... |
| Geolocation | Users see content tailored to their physical location | IP detection, cookies, user login | Don't do anything special just for Googlebot |
| First Click Free | Users clicking from Google to a listed page can read page without having to pay or register with the hosting site [if they try to click past that page, it's then OK to toss up a barrier] | IP detection, cookies, user login | You let Googlebot through as if it were a paid/registered member and also allow anyone coming from Google's search listings through |
| JavaScript Replacement | Using JavaScript to show content to non-JavaScript capable visitors (such as Google) that matches the textual information within a Flash or other multimedia element | JavaScript | Don't do anything special just for Googlebot [it sees what any non-JavaScript person would see] |
| Landing Page Testing | Using tools like Google Website Optimizer to change pages that are shown | JavaScript, other | Don't do anything special just for Googlebot [it sees what any non-JavaScript person would see] |
| URL Rewriting | Stripping out unnecessary parameters and other URL "garbage" not needed to deliver a page | Server side, JavaScript insertion | The underlying content isn't changing & you aren't just detecting Googlebot to do it |
Where am I getting the info for the chart above?
- Geolocation & First Click Free come from Google
Webmaster Central's June 2008
blog post. The post talks about Google's First Click Free as if it is
only for news content. That's not correct. You can use this method for
Google web search, as was covered by Google reps speaking at SMX Advanced
2008 last month.
- JavaScript Replacement: Google's
page on cloaking addresses this, plus reps have covered this advice in
many public forums.
- Landing Page Testing: Google Website Optimizer Now Available, But Is It Cloaking? covers the issues here, and How does Website Optimizer fit in with Google's view of cloaking? from Google AdWords help has detailed advice on how it is acceptable (and when it is not)
- URL Rewriting: Good Cloaking, Evil Cloaking & Detection from Stephan Spencer last year talked about the ways to strip down URLs so they appear "nicer" in search results. He'd surveyed the major search engines that emphatically told him this was OK, even if their spiders were detected in order to do it. I believe this is still fine as long as the content itself isn't changing.
And what's not OK?
| Content Delivery Method | Description | How Done |
| Cloaking | Showing Google content that typical users do not see | IP, User Agent Detection |
| Conditional Redirects | Showing Google a redirect code (301) that is different from what someone else sees | IP, User Agent Detection |
-
Cloaking comes from Google Webmaster Central's June 2008
blog post and its help
page on the topic. If you're delivering content to users that's
different than what Google sees -- and it's not listed on the first chart
-- you're probably cloaking.
- Conditional Redirects: AKA, cloaking redirects. Discussed at SMX Advanced 2008, especially how Amazon has been doing it. Google warned that conditional redirects might be risky.
"Whitehat" Cloaking
I said I'm pretty tired over the cloaking debate, right? That's because way back when the debate started, we had search engines that DID allow cloaking via XML feeds or other methods - or they turned blind eyes to cloaking they knew about -- or they had programs like Google Scholar that sure felt like cloaking before guidelines caught up to say no, those aren't cloaking.
These exceptions made it difficult to say cloaking was "bad" when clearly some of it was allowed. That's why I urged the search engines -- Google in particular -- to update the guidelines to say that any type of "unapproved" cloaking might get you banned. My thought was that we could maybe skip past the debate over tactics (and finger-pointing -- ah-ha!!!! -- this major site is cloaking, ban them!) and focus more on the user experience.
Things have changed since then. The guidelines, as I've explained, have gotten more detailed -- and exceptions have been spun off into new names. As part of this, the idea of "white hat cloaking" has come out, the "good" cloaking that's now acceptable.
I can tell you firsthand that Google doesn't like the phrase "white hat cloaking" at all. To Google, there's cloaking -- it's always bad -- and there are the other content delivery methods I've outlined above.
OK, I'll roll with that. Personally, I'll avoid talking about "white hat" or "good" cloaking if it helps improve relations between webmasters and the search engines AND helps newbies avoid getting into trouble. But I do think we need a term to encompass content delivery methods that do target spiders or consider them in some way.
First Click Free, JavaScript replacement, even geolocation -- while maybe you don't do something special for the search engines as part of them -- you're still considering them. Indeed, part of what you consider is to ensure that you might NOT do something special.
If cloaking is a bad word to the search engines that can never be redeemed, we still need a name for the "good" things that are out there. I've got my thinking cap on, and if you've got ideas, let me know by commenting.
I'll close by saying that my list above is not complete, in terms of the various content delivery methods that are out there. I hope to grow this over time -- and you can help with your comments, as well.
|
Like The Story? Vote For It On Yahoo Buzz!
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds) |
|
Subscribe To Our Search Feed! |
| Share & Bookmark This Story! |
By Danny Sullivan
Permalink
Jump To Comments
See Related Stories In: Google: SEO, SEO: Cloaking & Doorway Pages


