Today, Microsoft has posted to the Live Search Webmaster Center blog that the cloaking detection system they have been running for the past few months has, for some sites, skewed site statistics and ads reporting, as well as caused a high traffic load, and they have made some adjustments to the process to correct these errors. Below, details on the problems this process caused, what Live Search is doing about them, and a brief look at how the engines have historically dealt with cloaking.
For the last few months, there have been slight rumblings in the places webmasters frequent about odd referrers and traffic from MSNbot, as well as from a bot not identified as such, but coming from an IP range identified as belonging to Microsoft.
In September, MSNDude posted on Webmaster World, confirming that the traffic was coming from Microsoft as a result of “quality check” and that webmasters who blocked this bot faced being removed from the Live Search index.
Now Microsoft is further clarifying that the “quality check” referred to cloaking detection, although they allow that some cloaking may be valid.
In the blog post, Nathan Buggia of the Live Search Webmaster Team notes that over the last eight months, the team has been making good progress on spam fighting tools. One aspect of spam that they have been working on is cloaking. Cloaking is showing a different page to search engines than to users and the use of it has been the subject of many a debate over the years. Google has generally sent the message that all cloaking violates its guidelines (although in practice, some instances seem to be accepted). Microsoft’s post notes that “not all cloaking is spam related and we do our best to take this into account.” When I talked to Nathan about this, he said that they consider cloaking to be situations when site owners are trying to deceive search engines. However, they don’t recommend cloaking in any situation. In part, this recommendation stems from the fact that it’s difficult to gauge intent with automated cloaking detection tools and some innocent sites could suffer collateral damage.
If you show the same content to users and to search engines, but use a method that automated checks may flag as cloaking, or if you have cloaked your site in the past and you feel your site is being penalized in the Live Search index, you can request manual review of your site using the feedback at the Live Search Webmaster Center.
Live Search has been using a cloaking detection process that has used a bot not identified as MSNbot to crawl pages and compare them to the versions served to MSNbot. Microsoft says that this non-identified bot has adhered to robots.txt rules (this bot hasn’t retrieved the robots.txt file separately, but has followed the one retrieved by MSNbot), but has caused a few other problems for some site owners.
- Skewed ads reporting: The bot initially had a bug that caused it to download ad blocks, which made it appear as a user for ads reporting purposes. This may have overinflated impression counts and lowered click-through rate reporting.
- Skewed site statistics: In some cases, the bot caused high traffic loads on sites, which not only consumed extra bandwidth, but also distorted visitor counts in analytics reports. Because this bot didn’t identify itself as a bot, some traffic reporting software reported the visits as coming from users.
- Added unrelated search terms to HTTP logs: The bot often crawled a page as a user clicking through a search result. The search queries the bot used were often not related to the site, which caused incorrect search query reports and perplexed many a webmaster.
In addition, when webmasters began reporting these issues to Microsoft, they were unresponsive and, in fact, MSNDude posted on the Webmaster World forums that site owners who blocked this bot risked being removed from the Live Search index:
“We would request that you do not actively block the IP addresses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.”
The blog post today says that these issues have been resolved. The bot will no longer download ads blocks, will no longer crawl sites enough to place noticeable traffic loads, and won’t use non-related search strings. Also, the Live Search team says it has created a forum specifically to discuss crawler issues and that webmasters can also use the feedback form and should receive a timely response.
I asked if sites that blocked this bot would indeed be removed from the Live Search index and was told that:
“The tool we’re using to detect cloaking is just one of many inputs we use to determine if a site is spam. Blocking this bot does not necessarily mean that your site will be considered spam, but it is a possibility. We recommend any webmasters concerned should log into our Webmaster Tools (http://webmaster.live.com) and check to see if their site has been blocked. If so, they should submit a reinclusion request using the form.”
In a separate release, Microsoft notes that their Webmaster Center is now live and “provides all the necessary resources to optimize a Web site for achieving the highest possible algorithmic or “organic” listing on Live Search.”
When I talked to Nathan, he said that sites that are cloaking may continue to see some amount of traffic from this bot. This tool crawls sites throughout the web — both those that cloak and those that don’t — but those not found to be cloaking won’t continue to see traffic.
Live Search isn’t the only search engine to have processes that check for cloaking. Matt Cutts has said that Google checks both algorithmically and manually. (Although they have implemented a “first-click free” program, which is a type of approved cloaking.)
Yahoo’s guidelines warn against “pages that give the search engine different content than what the end user sees (cloaking).” Search engines want the search results to accurately represent what users will see when they click through to the linked pages, and deceptive cloaking interferes with this goal.
More signals from a major search engine that they are both actively seeking out cloaking and acknowledging that it’s not always spam aren’t likely to settle the cloaking debate. And one has to wonder how effective methods like this really are. Those savvy enough to cloak may be able to cloak for this new cloaker detection bot as well. I asked Microsoft if this tool has helped them achieve their spam reduction goals. They said that:
“Like all search engines, Live Search has been vigilantly identifying and removing spam content from our index since the inception of our search engine. With the launch of the update this Fall, we have measured a significant decrease in the amount of spam.”