• Search Engine Land
  • Sections
    • SEO
    • SEM
    • Mobile
    • Local
    • Retail
    • Google
    • Bing
    • Social
    • Home
  • Follow Us
    • Follow
  • Search Engine Land
  • SEO
  • SEM
  • Mobile
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • More
  • Events
    • Follow
  • SUBSCRIBE

Search Engine Land

Search Engine Land
  • SEO
  • SEM
  • Mobile
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Newsletters
  • Home
SEO

Sign up for our daily recaps of the ever-changing search marketing landscape.

Note: By submitting this form, you agree to Third Door Media's terms. We respect your privacy.


MSNbot 1.1: Live Search Implements A More Efficient Crawl

Vanessa Fox on February 12, 2008 at 4:09 pm
  • More

Today, Microsoft announced changes to their Live Search crawler intended to reduce bandwidth resources during the crawl of a site. MSNbot (upgraded to version 1.1) now supports both HTTP compression and conditional get. The post on the Live Search Webmaster Center blog describes each feature in detail and includes links to tools you can use to check your server for support of these features.

  • HTTP compression enables search engine crawlers (and browsers) to compress files before downloading them.
  • Conditional get lets the crawler ask a server if the page has been changed since the last request (using the If-Modified-Since header). If the content hasn’t changed, a server that supports conditional get returns a 304 response (not modified). When the crawler gets this response, it doesn’t download the page contents (and continues to use the version already downloaded).


As the blog post notes, the other major search engines support these features as well.

Google
Google overhauled their crawler to reduce bandwidth usage in 2006 as part of the “Bigdaddy” infrastructure change. With this effort, Googlebot increased support for HTTP compression and started using a crawl caching proxy. The Google webmaster help center describes Googlebot’s handling of conditional get, which is similar to MSNbot’s.

Yahoo!
In 2005, Yahoo! announced support of both HTTP compression and conditional get.

Ask
Ask’s webmaster documentation includes information about HTTP compression, although it doesn’t mention conditional get support.

Cache Dates

In 2006, Google changed how it displays cache dates of pages to reflect the most recent visit to the page, rather than the most recent download of the page. Live Search matches Google’s current functionality, showing the last time MSNbot visited the page as the cache date. Yahoo! doesn’t display a cached date.

Other Ways To Reduce Search Engine Crawler Bandwidth

If search engine crawlers use too much bandwidth on your site, even once your server has HTTP compression and conditional get turned on, you can use additional methods to reduce bandwidth consumption. However, keep in mind that unlike HTTP compression and conditional get, these other methods could potentially reduce the number of indexed pages.

Crawl Delay
Live Search, Yahoo!, and Ask all support the crawl-delay instruction in robots.txt (Google is the lone holdout). You specify the crawl-delay in seconds, which indicates how long the crawler should wait between page fetches.

A robots.txt file that directs all crawlers to wait five seconds between each page fetch looks as follows:

user-agent: *
crawl-delay: 5

The Live Search webmaster help notes that the news crawler doesn’t follow the crawl-delay instruction:

“Live Search also uses a dedicated crawler to crawl certain types of sites at high frequency. The msnbot-NewsBlogs/1.0 news crawler helps provide current results for our news site. The msnbot-NewsBlogs/1.0 does not adhere to the crawl-delay settings.

If you find that MSNBot is still placing too high a load on your web server, contact Site Owner Support.”

In a recent interview, Matt Cutts of Google explained that Google doesn’t support crawl delay.

“I believe the only robots.txt extension in common use that Google doesn’t support is the crawl-delay. And, the reason that Google doesn’t support crawl-delay is because way too many people accidentally mess it up. For example, they set crawl-delay to a hundred thousand, and, that means you get to crawl one page every other day or something like that.

We have even seen people who set a crawl-delay such that we’d only be allowed to crawl one page per month. What we have done instead is provide throttling ability within Webmaster Central, but crawl-delay is the inverse; it’s saying crawl me once every “n” seconds. In fact what you really want is host-load, which lets you define how many Googlebots are allowed to crawl your site at once. So, a host-load of two would mean, 2 Googlebots are allowed to be crawling the site at once.”

Google’s Webmaster Central’s crawl rate feature provides information about Googlebot’s current bandwidth usage and enables webmasters to request a slower crawl (and in some cases, a faster crawl).

Using robots.txt To Reduce Bandwidth
You can block pages or directories from being crawled to reduce overall bandwidth. If you have large portions of your site that you don’t want (or need) indexed, you can use robots.txt to block search engine crawlers from accessing them. Note that use of robots meta tags will keep the pages out of the index, but won’t achieve the bandwidth reduction goals, as the crawlers have to access the pages to read the meta tags.

Many webmasters assumed MSNbot already supported HTTP compression and conditional get, although some had criticized Live Search for using more bandwidth than other search engine crawlers. With these enhancements, webmasters who have these features enabled on their servers should notice a bandwidth reduction.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.



About The Author

Vanessa Fox
Vanessa Fox is a Contributing Editor at Search Engine Land. She built Google Webmaster Central and went on to found software and consulting company Nine By Blue and create Blueprint Search Analytics< which she later sold. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a foundation for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Related Topics

Bing SEOChannel: SEO

Sign up for our daily recaps of the ever-changing search marketing landscape.

Note: By submitting this form, you agree to Third Door Media's terms. We respect your privacy.


We're listening.

Have something to say about this article? Share it with us on Facebook, Twitter or our LinkedIn Group.

ATTEND OUR CONFERENCES

Lorem ipsum doler this is promo text about SMX events.

May 21-22, 2019: SMX London

Sept 30 - Oct 1, 2019: SMX Advanced Europe

November 18-19, 2019: SMX Paris

February 19-20, 2020: SMX West

March 18-19, 2020: SMX Munich

June 8-10, 2020: SMX Advanced

November 11-12, 2020: SMX East

×


Learn More About Our SMX Events

Gain new strategies and insights at the intersection of marketing, technology, and management. Our next conference will be held:

April 15-17, 2020: San Jose

October 6-8, 2020: Boston

×

Attend MarTech - Click Here


Learn More About Our MarTech Events

White Papers

  • Digital Insights 2019: How marketers confront the obstacles of digital customer engagement
  • The State of First-Party Marketing Data
  • Role of ROI in SEO
  • Marketing Trends: Nine factors reshaping marketing and how you can stay ahead of them.
  • How to Optimize Your Ad Strategy for Revenue
See More Whitepapers

Webinars

  • Optimizing Your Organization for SEO
  • The Truth About Personalization: Using a CDP to Personalize Marketing on the Channels That Really Matter
  • 7 New Trends for Better Email and Mobile Marketing
See More Webinars

Research Reports

  • Enterprise Digital Asset Management Platforms
  • Identity Resolution Platforms
  • Customer Data Platforms
  • B2B Marketing Automation Platforms
  • Enterprise SEO Platforms
  • Call Analytics Platforms
See More Research

Search Engine Land’s Guide To SEO

Master the Science of SEO
Sign up for our NEW daily brief.
Search Engine Land
Download the Search Engine Land App on iTunes Download the Search Engine Land App on Google Play

Channels

  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social

Our Events

  • SMX West
  • SMX London
  • SMX Advanced
  • SMX East
  • MarTech West
  • MarTech East

Resources

  • White Papers
  • Research
  • Webinars
  • Search Marketing Expo
  • MarTech Conference

About

  • About Us
  • Contact
  • Privacy
  • Advertise
  • Marketing Services
  • Staff
  • Connect With Us

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • Instagram
  • RSS
  • Youtube
  • iOS App
  • Google Play

© 2019 Third Door Media, Inc. All rights reserved.