See What Googlebot Sees On Your Site


Google Webmaster Tools has just launched a “labs” section, where you’ll find new features that may be early in the development cycle and not quite as robust as the rest of the tools. The features available so far are Fetch as Googlebot, which lets you see exactly what Googlebot is served when it requests a URL from your server and Malware Details, which shows you malicious code snippets from your site if it’s been flagged as containing malware.

Fetch as Googlebot

Of most interest to webmasters, SEOs, and web developers is likely the Fetch as Googlebot feature. You can specify any URL on your site and see the HTTP response (header and contents) that the server returns. Simply  indicate the URL and click the Fetch button. It may take a few moments for Googlebot to access the page and return the results, since it fetches the page in real time. (Refresh the page to see the progress.)

Google Fetch as Googlebot

Click the Success link once it’s been processed to see the results.

Google Fetch As Googlebot Results

How is this different from simply looking at the source code of the page?

  • You see the HTTP header information at the top. This information is generally easily available through tools such as Live HTTP Headers, but isn’t contained in the source code itself (since that information is coming from the server, not the page).
  • You can see if the server is returning any of the page information differently than the page has been coded.
  • You can see if the server is returning something different to Googlebot than what other users see. This tool uses the same user-agent and IP range as Googlebot when it crawls the web, so if the server is configured conditionally for user agent or IP address (typically known as “cloaking“), you’ll see  what’s being conditionally served to Google.
  • You can use the tool to test changes (particularly things like redirects) in real-time.

Note that this tool won’t necessarily show you the content that Google is able to extract from the page. If the page contains JavaScript, for instance, you’ll see the raw JavaScript code contained on the page, not the rendered view visible in the browser. Which, unfortunately means you can’t use this tool to determine if Google is able to access content contained in rich markup.

What’s this about cloaking?

This tool can help you determine if the pages are being cloaked to Google. This may be useful if you’re coming into a project late and aren’t sure what’s been previously done. It can also help uncover if your site has been hacked. Back in 2006, Googler Matt Cutts and I did a show on Webmaster Radio during which we talked about how in some cases, a hacker might add links to a site and then cloak those pages so that the site owner never sees them. Only Google does. At the time, Matt suggested using Google Translate (and choosing English to English) to see what Googlebot was being served, but this tool can now more easily serve that purpose. Matt confirmed this to me this morning: “The biggest use case is just debugging site issues. Of those, the biggest case will be hacked sites. Some attacks will hide content until search engines fetch the page (and some attackers add a noarchive tag so that the search result doesn’t have a “Cached” link), so a site could look clean to the website owner. Using this feature will site owners verify that there are no hidden links in the page that Google actually fetches.”

How do I test redirects?

If you’ve implemented redirects, you can use this tool to test how Googlebot will interpret those redirects without waiting for those pages to be crawled. For instance, when I fetch www.searchengineland.com, I see that the redirect is correctly implemented as a 301 and points to searchengineland.com:

Google Fetch as Googlebot

You can also use the tool to troubleshoot URLs listed in the Crawl Errors > Not Followed report. You can also test these URLs using something like Live HTTP Headers or by trying to access the URLs in a browser, but if neither of those methods uncover the problem, this tool can help determine that the issue is specific to Googlebot. You can also use this tool to verify that fixes you’ve made to redirect errors uncovered by the Not Followed report have really solved the problem.

(Note that the tool currently has a limit of 100kb per page. However, this is for the tool only and doesn’t apply to Googlebot’s normal crawl of the site. Google is monitoring feedback to see if many site owners find this size to be limiting.)

Malware details

The Google Online Security Blog has more information on the malware details tool. Previously, webmaster tools reported when the site was flagged has having malware and listed sample URLs. This new tool will also show samples of the malicious content, and in some cases, the underlying cause. This should help those site owners whose sites have been hacked to include malware find the problem and fix it. If your site does contain malware and you’ve fixed it, you can request a review to have the malware alert removed in search results.



Vanessa Fox is a Contributing Editor at Search Engine Land. Called a “cyberspace visionary” by Seattle Business Monthly, she is an expert in understanding customer acquisition from organic search. She shares her perspective on how this impacts marketing and user experience at ninebyblue.com and provides authoritative search-friendly design patterns for developers at janeandrobot.com.

See more articles by Vanessa Fox >


Share, Bookmark & Discuss This Article
More:


Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter


See more stories like this in the Members Library! Check out the Google: Webmaster Central, SEO: Cloaking & Doorway Pages, SEO: Redirects & Moving Sites sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!

ONE COMMENT ON See What Googlebot Sees On Your Site

Maggie@losasso,

Great post! This is a great way to see what Google sees in your site! Great tool for SEO site planners.



POST A COMMENT

Got a comment? Log in, register to comment or become a premium member to comment without CAPTCHA hassles, to have your own custom picture/avatar appear, plus many other benefits.


RECENT COMMENTS

  • Michael Martinez said " According to news sources, AOL's deal with Google runs out on December 17 and they have asked for (a"
  • EkOz said " IMO the pitfall in how Youtube handles ratings is that they only desplay an average rating. With no "
  • Robert said " There are smart phones that have RSS feed option in them. So even if your site is not mobile friendl"

See All »


FREE DAILY SEARCH NEWS RECAP!

Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›

STAY CURRENT THROUGHOUT THE DAY

RSS Feeds

The Search Engine Land feed keeps you informed as news happens. SEE ALL FEEDS »

Upcoming Search Engine Land Conferences

Advertise With Us »

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.


SMX Web Site » | SMX Difference » | SMX News »


Join us at an upcoming SMX event:

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:


See more webcast topics »

TRACK US SOCIALLY
Upcoming Search Engine Land Conferences

Get Your Search Engine Land
Premium Membership!

Become a premium member today and receive:

  • Express commenting privileges & photo.
  • Exclusive videos & newsletters.
  • Discounts to our SMX conferences.
  • Access to "How To" & Other Archives.

Learn More

Upcoming Search Engine Land Conferences
Add to GoogleAdd to My Yahoo!Add to BloglinesAdd to NetvibesAdd to Windows Live