See What Googlebot Sees On Your Site

Google Webmaster Tools has just launched a “labs” section, where you’ll find new features that may be early in the development cycle and not quite as robust as the rest of the tools. The features available so far are Fetch as Googlebot, which lets you see exactly what Googlebot is served when it requests a […]

Chat with SearchBot

Google Webmaster Tools has just launched a “labs” section, where you’ll find new features that may be early in the development cycle and not quite as robust as the rest of the tools. The features available so far are Fetch as Googlebot, which lets you see exactly what Googlebot is served when it requests a URL from your server and Malware Details, which shows you malicious code snippets from your site if it’s been flagged as containing malware.

Fetch as Googlebot

Of most interest to webmasters, SEOs, and web developers is likely the Fetch as Googlebot feature. You can specify any URL on your site and see the HTTP response (header and contents) that the server returns. Simply  indicate the URL and click the Fetch button. It may take a few moments for Googlebot to access the page and return the results, since it fetches the page in real time. (Refresh the page to see the progress.)

Google Fetch as Googlebot

Click the Success link once it’s been processed to see the results.

Google Fetch As Googlebot Results

How is this different from simply looking at the source code of the page?

  • You see the HTTP header information at the top. This information is generally easily available through tools such as Live HTTP Headers, but isn’t contained in the source code itself (since that information is coming from the server, not the page).
  • You can see if the server is returning any of the page information differently than the page has been coded.
  • You can see if the server is returning something different to Googlebot than what other users see. This tool uses the same user-agent and IP range as Googlebot when it crawls the web, so if the server is configured conditionally for user agent or IP address (typically known as “cloaking“), you’ll see  what’s being conditionally served to Google.
  • You can use the tool to test changes (particularly things like redirects) in real-time.

Note that this tool won’t necessarily show you the content that Google is able to extract from the page. If the page contains JavaScript, for instance, you’ll see the raw JavaScript code contained on the page, not the rendered view visible in the browser. Which, unfortunately means you can’t use this tool to determine if Google is able to access content contained in rich markup.

What’s this about cloaking?

This tool can help you determine if the pages are being cloaked to Google. This may be useful if you’re coming into a project late and aren’t sure what’s been previously done. It can also help uncover if your site has been hacked. Back in 2006, Googler Matt Cutts and I did a show on Webmaster Radio during which we talked about how in some cases, a hacker might add links to a site and then cloak those pages so that the site owner never sees them. Only Google does. At the time, Matt suggested using Google Translate (and choosing English to English) to see what Googlebot was being served, but this tool can now more easily serve that purpose. Matt confirmed this to me this morning: “The biggest use case is just debugging site issues. Of those, the biggest case will be hacked sites. Some attacks will hide content until search engines fetch the page (and some attackers add a noarchive tag so that the search result doesn’t have a “Cached” link), so a site could look clean to the website owner. Using this feature will site owners verify that there are no hidden links in the page that Google actually fetches.”

How do I test redirects?

If you’ve implemented redirects, you can use this tool to test how Googlebot will interpret those redirects without waiting for those pages to be crawled. For instance, when I fetch www.searchengineland.com, I see that the redirect is correctly implemented as a 301 and points to searchengineland.com:

Google Fetch as Googlebot

You can also use the tool to troubleshoot URLs listed in the Crawl Errors > Not Followed report. You can also test these URLs using something like Live HTTP Headers or by trying to access the URLs in a browser, but if neither of those methods uncover the problem, this tool can help determine that the issue is specific to Googlebot. You can also use this tool to verify that fixes you’ve made to redirect errors uncovered by the Not Followed report have really solved the problem.

(Note that the tool currently has a limit of 100kb per page. However, this is for the tool only and doesn’t apply to Googlebot’s normal crawl of the site. Google is monitoring feedback to see if many site owners find this size to be limiting.)

Malware details

The Google Online Security Blog has more information on the malware details tool. Previously, webmaster tools reported when the site was flagged has having malware and listed sample URLs. This new tool will also show samples of the malicious content, and in some cases, the underlying cause. This should help those site owners whose sites have been hacked to include malware find the problem and fix it. If your site does contain malware and you’ve fixed it, you can request a review to have the malware alert removed in search results.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Vanessa Fox
Contributor
Vanessa Fox is a Contributing Editor at Search Engine Land. She built Google Webmaster Central and went on to found software and consulting company Nine By Blue and create Blueprint Search Analytics< which she later sold. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a foundation for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Get the must-read newsletter for search marketers.