See What Googlebot Sees On Your Site

Google Webmaster Tools has just launched a “labs” section, where you’ll find new features that may be early in the development cycle and not quite as robust as the rest of the tools. The features available so far are Fetch as Googlebot, which lets you see exactly what Googlebot is served when it requests a URL from your server and Malware Details, which shows you malicious code snippets from your site if it’s been flagged as containing malware.

Fetch as Googlebot

Of most interest to webmasters, SEOs, and web developers is likely the Fetch as Googlebot feature. You can specify any URL on your site and see the HTTP response (header and contents) that the server returns. Simply  indicate the URL and click the Fetch button. It may take a few moments for Googlebot to access the page and return the results, since it fetches the page in real time. (Refresh the page to see the progress.)

Google Fetch as Googlebot

Click the Success link once it’s been processed to see the results.

Google Fetch As Googlebot Results

How is this different from simply looking at the source code of the page?

  • You see the HTTP header information at the top. This information is generally easily available through tools such as Live HTTP Headers, but isn’t contained in the source code itself (since that information is coming from the server, not the page).
  • You can see if the server is returning any of the page information differently than the page has been coded.
  • You can see if the server is returning something different to Googlebot than what other users see. This tool uses the same user-agent and IP range as Googlebot when it crawls the web, so if the server is configured conditionally for user agent or IP address (typically known as “cloaking“), you’ll see  what’s being conditionally served to Google.
  • You can use the tool to test changes (particularly things like redirects) in real-time.

Note that this tool won’t necessarily show you the content that Google is able to extract from the page. If the page contains JavaScript, for instance, you’ll see the raw JavaScript code contained on the page, not the rendered view visible in the browser. Which, unfortunately means you can’t use this tool to determine if Google is able to access content contained in rich markup.

What’s this about cloaking?

This tool can help you determine if the pages are being cloaked to Google. This may be useful if you’re coming into a project late and aren’t sure what’s been previously done. It can also help uncover if your site has been hacked. Back in 2006, Googler Matt Cutts and I did a show on Webmaster Radio during which we talked about how in some cases, a hacker might add links to a site and then cloak those pages so that the site owner never sees them. Only Google does. At the time, Matt suggested using Google Translate (and choosing English to English) to see what Googlebot was being served, but this tool can now more easily serve that purpose. Matt confirmed this to me this morning: “The biggest use case is just debugging site issues. Of those, the biggest case will be hacked sites. Some attacks will hide content until search engines fetch the page (and some attackers add a noarchive tag so that the search result doesn’t have a “Cached” link), so a site could look clean to the website owner. Using this feature will site owners verify that there are no hidden links in the page that Google actually fetches.”

How do I test redirects?

If you’ve implemented redirects, you can use this tool to test how Googlebot will interpret those redirects without waiting for those pages to be crawled. For instance, when I fetch www.searchengineland.com, I see that the redirect is correctly implemented as a 301 and points to searchengineland.com:

Google Fetch as Googlebot

You can also use the tool to troubleshoot URLs listed in the Crawl Errors > Not Followed report. You can also test these URLs using something like Live HTTP Headers or by trying to access the URLs in a browser, but if neither of those methods uncover the problem, this tool can help determine that the issue is specific to Googlebot. You can also use this tool to verify that fixes you’ve made to redirect errors uncovered by the Not Followed report have really solved the problem.

(Note that the tool currently has a limit of 100kb per page. However, this is for the tool only and doesn’t apply to Googlebot’s normal crawl of the site. Google is monitoring feedback to see if many site owners find this size to be limiting.)

Malware details

The Google Online Security Blog has more information on the malware details tool. Previously, webmaster tools reported when the site was flagged has having malware and listed sample URLs. This new tool will also show samples of the malicious content, and in some cases, the underlying cause. This should help those site owners whose sites have been hacked to include malware find the problem and fix it. If your site does contain malware and you’ve fixed it, you can request a review to have the malware alert removed in search results.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEO | Google: Webmaster Central | SEO: Cloaking & Doorway Pages | SEO: Redirects & Moving Sites

Sponsored


About The Author: is a Contributing Editor at Search Engine Land. She built Google Webmaster Central and went on to found software and consulting company Nine By Blue and create Blueprint Search Analytics< which she later sold. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a foundation for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.losasso.com/ Maggie@losasso

    Great post! This is a great way to see what Google sees in your site! Great tool for SEO site planners.

  • http://www.facebook.com/sammyjayjayjay Sammy JayJay

    Great info, this is exactly what I need to get more accurate analysis of how my site seems to Google so I can check properly for errors or other discrepancies.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide