How (I Think) Crawl Budget Works (Sort Of)

Ian Lurie on
  • Categories: All Things SEO Column, Channel: SEO
  • Crawl budget – the number of pages a search engine will crawl each time it visits your site – is a huge factor in SEO success. While a number of factors go into determining the amount of pages crawled, one fairly strong one is the site’s overall crawl budget (crawl efficiency being a key component as well). Of course, a bigger crawl budget is almost always better (duh).

    But the relationship between crawl budget and organic SEO isn’t 100% straightforward. So, I’m going to talk first about how Google determines crawl budget, then about how crawl budget affects search traffic, and then how you can make the best use of your existing crawl budget.

    How Google Determines Crawl Budget

    It’s all about PageRank.

    Google sets crawl budget based primarily on your site’s authority. That means PageRank of individual pages. Sites with a higher average PR across their pages will see a larger crawl budget; Google will crawl more URLs on these sites.

    <–Nerd alert!–>

    If you aren’t obsessed with things like log files and true Pagerank, ignore the next 2 paragraphs. They’ll just make you roll your eyes.

    To confirm this, I sampled the log files of 40 sites, checking the number of URLs downloaded by Googlebot during a single crawl. I defined a ‘crawl’ as one set of visits by Googlebot, with no more than 30 seconds between Googlebot hits.

    Data Disclaimer:

    Yes, I know, I should compare thousands of sites, not 40. If anyone wants to take up a collection so I can spend all my time digging through log files, instead of working for a living, let me know.

    I could not rely purely on Toolbar PageRank, since it’s usually out of date. So I used a combination of Toolbar PR, MozRank, cumulative inlinks in Majestic SEO’s toolset, and a few gut-check statistics so hocus-pocus that I’m embarrassed to include them here. I also looked at Google Webmaster Tools to make sure my log file analysis was proportional.

    No matter how you slice the data, though, the result is pretty telling:

    Crawl budget increases with higher PR, yes. But there’s a huge jump in URLs per crawl after sites pass a PR of 5-6.

    I don’t have the good luck to have any sites with a PR of 8+. If you do, and are willing to send me a log file or two, I can take a shot at that data as well.

    So yes, in my opinion, PR is pretty important, along with other factors at play. But it seems to me, building authority (to individual pages)  through links and social media is an even bigger deal than you thought, huh?

    How Crawl Budget Affects Organic Search Traffic

    Not directly.

    Just because you have a big, fat crawl budget doesn’t mean you’ll get scads of organic search traffic. I compared the same 40 sites’ crawl budgets and percentage of traffic from organic search, and got this:

    Um. OK. That’s not exactly a powerful correlation.

    Then I compared crawl budget to raw organic traffic (not a percentage) and got this:

    That might imply a correlation between crawl budget and organic traffic. But it also might just mean sites with higher authority get more organic traffic. Which hints at a relationship between crawl budget and traffic, but hardly confirms it.

    Apparently, a higher crawl budget doesn’t even equal more pages in the Google index:

    Again, there’s some correlation, but I’d expect a lot more given how fast crawl budget goes up with higher PageRank.

    If you think about it carefully, though, it makes sense: Crawl budget does not equal lots of organic traffic. It doesn’t equal lots of indexed pages. It merely equals Googlebot visiting many URLs on your site. (A word of caution: more crawling isn’t necessarily better if the high rate of crawl is due to crawl inefficiency; for example, canonicalization problems could result in high crawl, but low indexing.)

    You need to use crawl budget wisely if you want it to translate to higher rankings and more organic traffic.

    Making The Most Of Your Crawl Budget

    It’s not the size of your crawl budget. It’s how you use it.

    I actually wrote this entire article just so I could use that line.

    Punny, but true. If Googlebot crawls thousands of pages of your site but finds that 90% of those pages are content-poor, duplicates or so slow they time out, then your crawl budget is the SEO equivalent of Monopoly money.

    To make the most of your crawl budget, do all the stuff you should be doing anyway:

    1. Remove duplicates. Don’t make Googlebot crawl 3 versions of every page. Rel=canonical won’t fix this! Googlebot still has to crawl canonically incorrect pages, even if you mark them as such. Fix duplicate content issues, once and for all, so Googlebot doesn’t waste crawl budget reading the same content multiple times.
    2. Fix broken links. Redirection is a good last resort, but wherever possible, really fix 404 errors.
    3. Speed up your pages. Follow best practices for fast-loading pages: Make sure you’re hosted on a fast server with plenty of resources; remove embedded Javascript and CSS; write efficient code; use tools like Google Pagespeed and YSlow. Anything you can do to reduce page load times will help.
    4. Don’t write ‘thin’ pages. Avoid having lots of pages on your site that have zero or very little real text. These pages are very hard for Google to classify, and can’t contribute much to your site’s overall relevance. Plus, they’re unlikely to rank in the organic results. That often makes them a poor expenditure of crawl budget.
    5. Generate an XML sitemap. Make sure Google knows where all your pages are.
    6. Check your log files. Learn to look for signs that Google or Bing are getting hung up in piles of duplicate or slow-loading pages. Troubleshoot and fix those problems.
    7. Build authority. Duh. Of course you should build authority! But don’t focus purely on your home page. Be sure to build links to deeper site pages, too. A few deep links can pump up the PageRank of deeper site pages and have a big effect on average PR.
    8. Use smart site architecture. Build your site with faceted navigation. Minimize the number of links per page. This will maximize your average PageRank and increase your crawl budget.

    Crawl Budget = Opportunity

    Your crawl budget is currency. You use it to ‘buy’ time that Googlebot can spend visiting your site, grabbing URLs and attempting to add them to its index.

    How you spend that currency is up to you. Time spent maximizing the value Google gets per page crawled will definitely pay off.

    Note: This Is All Relevant To Google

    I wish I could provide information about Bing’s algorithm for setting crawl budget, but I can’t. There are some direct quotes from Googlers, including Eric Enge’s great interview with Matt Cutts. And, I’ve got lots of log file data to look at.

    Bing, not so much. While it’s safe to assume that Bing uses a similar budgeting system, Bing has also stated they’re trying to get away from domain scoring as part of their ranking algorithm.

    It’s safe to say that Bing uses a similar budgeting concept, but I can’t guarantee it. Worst case, though, suggestions in this article will have no effect on the crawl budget Bing assigns your site.


    About The Author

    Ian Lurie
    Ian Lurie is Chief Marketing Curmudgeon and President at Portent, Inc, a firm he started in 1995. Portent is a full-service internet marketing company whose services include SEO, SEM and strategic consulting.