Google Retires The Googlebot-News Bot

googlebotToday, Google announced that they will no longer be crawling news sites with Googlebot-News and instead will crawl news sites with Googlebot, the same bot that crawls sites for web search. However, you can still block your content from being indexed in Google News by disallowing Googlebot-News in robots.txt or using a meta robots tag.

Blocking Content From Google News

Seem confusing? On the one hand, it’s not at all.

If you want Google to index your content in both web search and News (if you are a Google News publisher), then you don’t need to do anything. Google will keep crawling as it always has, but if you look at your server logs, you’ll only see entries for Googlebot rather than entries for both Googlebot and Googlebot-News.

If you want to keep your content out of Google News, you can keeping using the Disallow directive in robots.txt (or meta robots tag) to block Googlebot-News. Even though Google will now crawl as Googlebot rather than Googlebot-News, they’ll still respect the Googleb0t-News robots.txt directive.

You can no longer, however, disallow Googlebot and allow Googlebot-News as you can for other specialized Googlebots, although you could before this change.

Gathering Data About How Your Site Is Crawled

On the other hand, this change makes things a lot more confusing if you’re using data to understand how your site is crawled and make improvements.

For instance, if you notice that your news articles aren’t being indexed in Google News and you check the news-specific crawl errors in Google Webmaster Tools and don’t see any problems, you can no longer check your server logs to see if those articles are being crawled for the news index. You can see if the pages are being crawled generally, but this less granular insight makes it tougher to troubleshoot problems.

In this example, you may be generating a news-specific Sitemap and that generation process may be missing specific URLs. You used to be able to review your server logs, see that Googlebot-News was crawling particular URLs but not others, and then check to see if the URLs that hadn’t been crawled were in the Sitemap. Now, all the server logs will tell you is whether Google is crawling the URLs at all. If they are being crawled for web search but not News, that detail is now lost.

You lose granular insight for web search as well. If you are tracking down why particular pages on your site aren’t indexed, you could previously review your server logs to see if they were being crawled, but now it will appear as though they are, even if they are only being crawled for Google News.

You can still get News-specific and web-specific crawl errors from Google webmaster tools, so some insight is still available. In terms of granularity, Google tells me that the Google webmaster tools URLs restricted by robots.txt report includes only the pages blocked from web search and not URLs blocked from Google News.

However, It doesn’t sound like you can currently see a list of URLs Google tried to crawl but didn’t due to Googlebot-News being blocked, and unfortunately the robots.txt analysis tool in Google webmaster tools doesn’t let you test URLs blocked in Google News separately from web search. So it would be tough to determine if you were accidentally blocking URLs from indexing in Google News.

This change seems like a bit of a step backward to me. When Google News was first launched, Googlebot crawled for both web search and News and news publishers asked for a news-specific bot. Certainly, the most important reason for this is the ability to block and allow content from Google News separately from web search, and that functionality remains. However, the granular insight available was useful as well, and it’s unfortunate that will now be lost.

Related Topics: Channel: SEO | Google: News | Google: Webmaster Central | Top News


About The Author: is a Contributing Editor at Search Engine Land. She founded ninebyblue.com and Blueprint Search Analytics. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a blueprint for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Connect with the author via: Email | Google+


SMX - Search Marketing Expo

SearchCap:

Get all the top search stories emailed daily!  

Like This Story? Please Share!

Other ways to share:

Like Our Site? Follow Us!

Subscribe to Our Feed! Join our LinkedIn Group Check out our Tumblr! See us on Pinterest Get Search Engine Land on your mobile device!
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.xing.com/profile/Ludwig_Coenen2 ludwig coenen

    would be interesting to know, if this means that the behaviour / capabilities of the bot have changed as well.

    the google news crawling process seemed always to be focused on speed and not a precise DC detection for instance. maybe that changes, when google bot does the crawling.

    or is that not very likely to happen?

Get Our News, Everywhere!

 
  • Advertise With Us
 

Click to watch SMX conference video

Join us at an upcoming SMX event:

North America

EMEA

APAC

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.

SMX Site » | SMX Difference » | SMX News »




 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide