See What Googlebot Sees On Your Site


Google Webmaster Tools has just launched a “labs” section, where you’ll find new features that may be early in the development cycle and not quite as robust as the rest of the tools. The features available so far are Fetch as Googlebot, which lets you see exactly what Googlebot is served when it requests a URL from your server and Malware Details, which shows you malicious code snippets from your site if it’s been flagged as containing malware.

Fetch as Googlebot

Of most interest to webmasters, SEOs, and web developers is likely the Fetch as Googlebot feature. You can specify any URL on your site and see the HTTP response (header and contents) that the server returns. Simply  indicate the URL and click the Fetch button. It may take a few moments for Googlebot to access the page and return the results, since it fetches the page in real time. (Refresh the page to see the progress.)

Google Fetch as Googlebot

Click the Success link once it’s been processed to see the results.

Google Fetch As Googlebot Results

How is this different from simply looking at the source code of the page?

  • You see the HTTP header information at the top. This information is generally easily available through tools such as Live HTTP Headers, but isn’t contained in the source code itself (since that information is coming from the server, not the page).
  • You can see if the server is returning any of the page information differently than the page has been coded.
  • You can see if the server is returning something different to Googlebot than what other users see. This tool uses the same user-agent and IP range as Googlebot when it crawls the web, so if the server is configured conditionally for user agent or IP address (typically known as “cloaking“), you’ll see  what’s being conditionally served to Google.
  • You can use the tool to test changes (particularly things like redirects) in real-time.

Note that this tool won’t necessarily show you the content that Google is able to extract from the page. If the page contains JavaScript, for instance, you’ll see the raw JavaScript code contained on the page, not the rendered view visible in the browser. Which, unfortunately means you can’t use this tool to determine if Google is able to access content contained in rich markup.

What’s this about cloaking?

This tool can help you determine if the pages are being cloaked to Google. This may be useful if you’re coming into a project late and aren’t sure what’s been previously done. It can also help uncover if your site has been hacked. Back in 2006, Googler Matt Cutts and I did a show on Webmaster Radio during which we talked about how in some cases, a hacker might add links to a site and then cloak those pages so that the site owner never sees them. Only Google does. At the time, Matt suggested using Google Translate (and choosing English to English) to see what Googlebot was being served, but this tool can now more easily serve that purpose. Matt confirmed this to me this morning: “The biggest use case is just debugging site issues. Of those, the biggest case will be hacked sites. Some attacks will hide content until search engines fetch the page (and some attackers add a noarchive tag so that the search result doesn’t have a “Cached” link), so a site could look clean to the website owner. Using this feature will site owners verify that there are no hidden links in the page that Google actually fetches.”

How do I test redirects?

If you’ve implemented redirects, you can use this tool to test how Googlebot will interpret those redirects without waiting for those pages to be crawled. For instance, when I fetch www.searchengineland.com, I see that the redirect is correctly implemented as a 301 and points to searchengineland.com:

Google Fetch as Googlebot

You can also use the tool to troubleshoot URLs listed in the Crawl Errors > Not Followed report. You can also test these URLs using something like Live HTTP Headers or by trying to access the URLs in a browser, but if neither of those methods uncover the problem, this tool can help determine that the issue is specific to Googlebot. You can also use this tool to verify that fixes you’ve made to redirect errors uncovered by the Not Followed report have really solved the problem.

(Note that the tool currently has a limit of 100kb per page. However, this is for the tool only and doesn’t apply to Googlebot’s normal crawl of the site. Google is monitoring feedback to see if many site owners find this size to be limiting.)

Malware details

The Google Online Security Blog has more information on the malware details tool. Previously, webmaster tools reported when the site was flagged has having malware and listed sample URLs. This new tool will also show samples of the malicious content, and in some cases, the underlying cause. This should help those site owners whose sites have been hacked to include malware find the problem and fix it. If your site does contain malware and you’ve fixed it, you can request a review to have the malware alert removed in search results.

Save $300 by registering now for Search Marketing Expo - SMX East, New York City, October 4-6. Time is running out! Rates increase September 11.


Share, Bookmark & Discuss This Article
More:


Vanessa Fox is a Contributing Editor at Search Engine Land. Called a “cyberspace visionary” by Seattle Business Monthly, she is an expert in understanding customer acquisition from organic search. She shares her perspective on how this impacts marketing and user experience at ninebyblue.com and provides authoritative search-friendly design patterns for developers at janeandrobot.com. Her book, Marketing in the Age of Google, provides a blueprint for incorporating search strategy into organizations of all levels.

See more articles by Vanessa Fox >



Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter


See more stories like this in the Members Library! Check out the Google: Webmaster Central, SEO: Cloaking & Doorway Pages, SEO: Redirects & Moving Sites sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!

ONE COMMENT ON See What Googlebot Sees On Your Site

Maggie@losasso,

Great post! This is a great way to see what Google sees in your site! Great tool for SEO site planners.




RECENT COMMENTS

  • Ellie Vogel said " Shari: love your title Aboutness. Great concept and new word? Can you update us on the importance of"
  • howardgr said " -- 0 Just a second of careless driving is what it takes for an accident to happen. Riders need to be"
  • Ani Lopez said " Hi Shari: "snippet has more text in it than a title therefore, people will probably spend more time "

See All »


FREE DAILY SEARCH NEWS RECAP!

SearchCap is a once-per-day newsletter update:

STAY CURRENT THROUGHOUT THE DAY

Our feed & social options update you as news happens.


Advertise With Us »

Search Marketing Expo

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.


SMX Web Site » | SMX Difference » | SMX News »


Join us at an upcoming SMX event:

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:


See more webcast topics »

FOLLOW US SOCIALLY
Search Engine Land Premium Membership

Get Your Search Engine Land
Premium Membership!

Become a premium member today and receive:

  • Express commenting privileges & photo.
  • Exclusive videos & newsletters.
  • Discounts to our SMX conferences.
  • Access to "How To" & Other Archives.

Learn More

Search Engine Land Premium Membership
Add to GoogleAdd to My Yahoo!Add to BloglinesAdd to NetvibesAdd to Windows Live