Sigh. Double sigh. Triple sigh. I guess now that the SEO industry has had the required twice-yearly debate about the reputation of SEO, it’s time to do the go round about cloaking once again. A quick word about cloaking has Google’s Matt Cutts trying to clarify concerns that Philipp Lenssen of Google Blogoscoped has been raising about WebmasterWorld. The comments are now up over 100, as people rehash things that have been hashed, mashed, rebaked so many times before. Below, some cloaking history plus an honest plea about trying to get past this stupid, stupid issue.
Let’s do the definition, first:
Cloaking is when you show a search engine content that is different than what a human being sees.
Got it? That’s my definition, and Matt says virtually the same thing in his post today:
Cloaking is serving different content to users than to search engines.
So simple. What’s to debate? Well, is it cloaking if…
- A spider coming from a US IP address sees a different page than a user from a UK IP address?
- A spider sees content that a user sees, but only if they do free registration
- A spider sees content that a user sees, but only if they do paid registration
- A spider sees content in text that represents what a users sees in Flash
Pick one of those above — pick something else (see our Good Cloaking, Evil Cloaking & Detection column from last week) — and people can, will and have pointed at something someone is doing, then yelled "cloaking" and screamed for a ban to happen. A ban? Well, as you know, all search engines hate cloaking. Actually, that’s always been a confused point. Here starts the lesson.
History Time: Tactics Versus Intent
Back in January 2003, Alan Perkins wrote this big Cloaking Is Always A Bad Idea article, telling us that search engines always said cloaking was bad. I was never a proponent of cloaking. I was, however, well aware that NOT all the guidelines were against cloaking. In addition, with paid inclusion, I argued some cloaking was actually allowed. All this went into my Ending The Debate Over Cloaking that came out in reaction to Alan’s article, in February 2003.
Since I knew that all the search engines had allowed some types of cloaking, my advice to marketers was this, with the stress on avoiding "unapproved cloaking:"
Cloaking is getting a search engine to record content for a URL that is different than what a searcher will ultimately see, often intentionally. It can be done in many technical ways. Several search engines have explicit bans against unapproved cloaking, of which Google is the most notable one. Some people cloak without approval and never have problems. Some even may cloak accidentally. However, if you cloak intentionally without approval — and if you deliver content to a search engine that is substantially different from what a search engine records — then you stand a much larger chance of being penalized by search engines with penalties against unapproved cloaking. If in doubt, ask the search engine if it has a problem with what you intend to do, assuming you can’t get a clear answer from written guidelines that are provided. If you are working with a third-party search engine marketer, ask them for proof that what they intend to do is approved. Otherwise, be prepared for any adverse consequences.
The suggestion to avoid "unapproved cloaking" infuriated Doug Heil over at the iHelpYou forums, who could not (and to this day still cannot) get over the idea that cloaking MUST equal spamming.
My response back then remains the same today. There’s a difference between tactics and intent. Many of the things that might cause penalties with search engines are tactics (hidden text, gibberish pages, cloaking) that are closely aligned with the intent of trying to mislead or game the search algorithms. But in some cases, what’s a bad tactic (or technical implication) might have a good intent as agreed by the search engines. So they’ll allow it, either turning a blind eye to it or giving it some official endorsement.
That difference is important because back then, if the search engines got behind the "unauthorized" versus "authorized" suggestion, we wouldn’t be having today’s wasteful argument. But let’s carry on.
NPR, Google Scholar & Approved Cloaking
In May 2004, I looked at how National Public Radio was, in my view, cloaking text transcripts of audio to search engines but only letting human visitors by those. At the time, Google had a guideline against cloaking that read:
The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site. In other words, the webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they’ll find when they click on a search result. To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings.
I argued that this was an example of "good cloaking" and that the real issue I had with it was that other marketers were supposedly banned from doing it:
As a searcher, I’m actually glad the method is being used. It does mean I’m more likely to find audio content of interest. Moreover, I can listen to that for free via the NPR site.
As a search engine marketer, I’m not so thrilled. I’m well aware that many other companies would like the ability to feed Google content in this manner. In addition, they have just as compelling arguments as NPR about having good content that isn’t adequately indexed by the Google crawler. Unfortunately, they’re denied the privilege of feeding relevant material just to Google’s crawler.
What about Yahoo? Anyone can enjoy the same benefits that NPR has, the ability to cloak content when relevant, through Yahoo’s content acquisition program. Non-profit organizations are offered this for free. Commercial organizations have to pay, making use of Yahoo’s trusted feed program.
This system may lead to problems for some searchers. In the example above, not only could I NOT read the paper, as I didn’t have a subscription, but I also could not read even an abstract. Instead, a password-prompt continued to appear, even when I cancelled it, making it extremely difficult to finally close the window (and that’s why I haven’t linked to the actual paper, to save other people the problem).
This situation is probably unusual, however. One of Google’s requirements for inclusion in Google Scholar is that publishers at least show abstracts to searchers.
The special access for publishers flies in the face of Google’s anti-cloaking policy. Google is being shown material that regular users wouldn’t normally see, its own definition of cloaking. This is a GOOD thing for searchers, but the company needs to amend its cloaking policy so as not to be hypocritical.
Indeed, that’s long overdue. This has been a problem since I first reported about a similar issue earlier this year. A sidebar piece … looks at the latest case and suggests some fixes for Google, including finally moving forward with formalizing such programs for ALL publishers.
Plea For Google To Change The Cloaking Guidelines
In the sidebar to that article, and again in a follow-up piece a few days later, I urged Google to alter its cloaking policy to something stressing that cloaking was bad only if not approved:
The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site without permission. In other words, the webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they’ll find when they click on a search result. To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking without our permission, if we feel it is harmful to our search rankings.
BMW, WebmasterWorld & New York Times Cloaking Accusations
In March 2005, there was great amusement in some quarters when Google’s policy against cloaking caused it to ban itself when pages apparently with text designed to help internal Google searching made also onto external versions seen only by Google spiders, rather than humans (see here, here and here for more).
By the end of 2005, WebmasterWorld came under accusations of cloaking. Since it is one of the most important forums about search engines around — frequented by official Google reps — it sort of became a poster child of "why can they do it but others can’t" for some.
Marshall Simmonds, the NYTimes & Acceptable Cloaking was a giant discussion that came out of that article, which had me originally arguing that this was cloaking, since search spiders were seeing something different than most humans could see. But I was convinced to change my mind. Since the spiders were indeed shown what anyone could ultimately see, this wasn’t cloaking. I commented:
Until now, I would have considered feeding a search engine a page that people couldn’t see unless they registered to have been unapproved cloaking, since most users were seeing something different that then spider saw.
But sure, I’ll buy into the "eventually you’ll see the same thing" argument as this not being cloaking. Why not? Google’s allowed this in approved cases for about two years now and never wants to go on the record as this being approved cloaking. So don’t call it cloaking and everyone’s happy. Google’s not allowing something officially they say not to do, and content owners can do this without fear.
In fact, I expect Fantomaster, Beyond Engineering and anyone with IP delivery lists now can have some new-found respect from people who previously slammed them as helping cloakers. Here they’ve been saying its all about content delivery systems and now they’re right, at least in some situations. Because after all, some people aren’t going to want to depend on user agent delivery to feed content this way.
Of course, I’m still not going to do this yet with my own members-only content. Despite agreeing with you, Phil — despite seeing others allowed to do this – I’m still fearful Google might arbitrarily decide to call it cloaking anyway when they choose. But maybe I’ll be braver down the line, and why not? It’s like a whole new world.
I did have several off-the-record conversations with Google about this. The main thing that came out that I can report was that Google really felt most users should see what their spiders saw WITHOUT having to register or pay for access. That’s similar to what Matt wrote about WebmasterWorld content today, when covering the latest criticisms that it might be cloaked:
I consider the issue in a much better state now, in that most (all?) Google searchers get the identical page to what Googlebot saw. But I still consider Philipp’s February posts open for investigation, and I will get to them, in the same way that I tackled Philipp’s first two posts about this.
FYI, to understand a bit more about the WebmasterWorld situation, see Matt’s post from March 2006, How to sign up for WebmasterWorld. It explains how in some cases, trying to access a thread that you might come across from a search result can trigger a registration results. In today’s post at Matt’s, WebmasterWorld’s Brett Tabke explains and points at more information on how this has changed.
The comments in Matt’s post also get into an apparent return of some New York Times content requiring registration, as well as more complaints about Google Scholar content (added with the full cooperation of Google) also being annoying, in that you can’t read it without paying when clicking from search results.
Solution 1: Allow Registered Content
Enough history. Time for some solutions. Both Google and Yahoo have programs that allow people with free registration or fee-based content to show up in search results — and I mean mixed in with regular search results, not segregated out like the Yahoo Subscriptions product launched in June 2005. Yahoo’s will happen mainly through paid inclusion, and not that much. Google’s will happen primarily through the aforementioned Google Scholar plus the First Click Free system used for Google News but which also may happen with some web search content.
None of this content is labeled in any way. Back to that discussion on cloaking and the New York Times, I commented:
It is annoying to hit one of these links and not know that payment or registration is required, however. That problem’s going to get worse as more and more people decide this isn’t cloaking and give it a go. Google and the others should look to establish a way for site owners to better flag premium or registration only content….
I don’t agree having paid content in regular search results is bad. I have a Wall Street Journal paid subscription. They have lots of great content. If I’m doing a search, and they’ve got a good match, I want to know that.
And over at Yahoo, as I explained way way back above, I can do that. I can choose specifically to have this content revealed to me. It doesn’t make my results bad at all.
It is a bad user experience if you constantly get back results that you can’t actually view, of course, without paying. We simply aren’t going to subscribe to everything.
The solution is easy. Give users the option. Let me choose to see content that requires payment or not. Or similarly, let me choose to see content that might require free registration. We just need Google and the others to graduate from 1999 mentality and better accommodate web sites with this type of content. It’s easily done, if they want to do it. And getting more formalized program for publishers, as well as options for searchers, will help.
So enough already. Enough with the special programs that only some publishers get to do. I want Google — which leads the charge in scaring people about cloaking — to fast track a system to let anyone with registration-based or fee-based content to be in their search results.
As for usability, either flag the URLs so users know to expect a charge or registration request or make it possible for users to exclude this information. Or experiment with both. But do something so that publishers don’t take matters into their own hands and the SEO industry has to have yet another one of these debates over whether it’s cloaking.
Solution 2: Allow For Approved Cloaking
Remember that suggested revision for Google’s cloaking policy I gave above? Well sometime in 2006, Google dropped its definition of cloaking entirely. Instead, we were just left with shorter definition of cloaking here:
Make pages for users, not for search engines. Don’t deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."
And a warning against it here:
However, certain actions such as cloaking, writing text in such a way that it can be seen by search engines but not by users, or setting up pages/links with the sole purpose of fooling search engines may result in removal from our index.
That warning, like the old warning, uses the word "may" in terms of removal for cloaking. Will cloaking get you banned? Maybe. Maybe if it is noticed. Maybe if it is deemed harmful to the users. Maybe if after a closer look, it can’t be pigeonholed in some other definition.
For me, it would be clearer to go back to the old definition and stress that unless approved, cloaking might result in a ban. Roll that out along with a program making it easier for people to feed in registered content. That gives Google flexibility, helps publishers plus stops this insane focus on technical/tactical implementations and refocuses concern where it belongs — the user experience.
Did what you do help or harm the search results, in Google’s opinion? If you were harming search results, Google’s always reserved the right to boot you out. And if you were technically violating guidelines but not in a harmful way, Google’s always reserved the right to turn a blind eye. Or rather, an approving eye, an eye knowing that it’s intent that matters, not some technicality.