Google On How They Know When To Slow Or Stop Crawling Your Web Site
Google crawl efficiency signals include connect time and HTTP server status codes.
Today at SMX East, Google’s Webmaster Trends Analyst, Gary Illyes shared with the audience two technical ways Google determines when GoogleBot, its crawler, should slow down or stop crawling your website.
One of the more important factors with SEO is to ensure the search engine crawlers can access your web pages. If they cannot access your web pages, then you will most likely have a hard time ranking in the search results.
Google said that it uses many signals for determining if it should stop crawling your website, outside of the obvious one like the disavow, robots.txt and nofollow tags.
Gary from Google said the following two signals are important crawl signals for Google:
Google will look at how much time it takes for them to connect to the server and web page. If that connect time is getting longer and longer, Google will back off and slow or stop crawling your web pages. Google doesn’t want to end up taking down your server, so it uses connect time as a crawling factor.
HTTP Status Codes
Google will also stop or slow their crawling when it gets server status codes in the 5xx range. The 5xx Server Error status codes often mean there are issues with the server responding. You can find a whole list of them on Wikipedia.
Google says when it sees these codes, it backs off, as to not cause more issues for your server.
In both cases, GoogleBot will come back later, but it backs off when it sees these two signals are causing issues as to not cause more issues for your users when it tries to access your website.