Crawl budget – the number of pages a search engine will crawl each time it visits your site – is a huge factor in SEO success. While a number of factors go into determining the amount of pages crawled, one fairly strong one is the site’s overall crawl budget (crawl efficiency being a key component as well). Of course, a bigger crawl budget is almost always better (duh).

But the relationship between crawl budget and organic SEO isn’t 100% straightforward. So, I’m going to talk first about how Google determines crawl budget, then about how crawl budget affects search traffic, and then how you can make the best use of your existing crawl budget.

How Google Determines Crawl Budget

It’s all about PageRank.

Google sets crawl budget based primarily on your site’s authority. That means PageRank of individual pages. Sites with a higher average PR across their pages will see a larger crawl budget; Google will crawl more URLs on these sites.

<–Nerd alert!–>

If you aren’t obsessed with things like log files and true Pagerank, ignore the next 2 paragraphs. They’ll just make you roll your eyes.

To confirm this, I sampled the log files of 40 sites, checking the number of URLs downloaded by Googlebot during a single crawl. I defined a ‘crawl’ as one set of visits by Googlebot, with no more than 30 seconds between Googlebot hits.

Data Disclaimer:

Yes, I know, I should compare thousands of sites, not 40. If anyone wants to take up a collection so I can spend all my time digging through log files, instead of working for a living, let me know.

I could not rely purely on Toolbar PageRank, since it’s usually out of date. So I used a combination of Toolbar PR, MozRank, cumulative inlinks in Majestic SEO’s toolset, and a few gut-check statistics so hocus-pocus that I’m embarrassed to include them here. I also looked at Google Webmaster Tools to make sure my log file analysis was proportional.

No matter how you slice the data, though, the result is pretty telling:

Crawl Budget versus Domain Authority

Crawl budget increases with higher PR, yes. But there’s a huge jump in URLs per crawl after sites pass a PR of 5-6.

I don’t have the good luck to have any sites with a PR of 8+. If you do, and are willing to send me a log file or two, I can take a shot at that data as well.

So yes, in my opinion, PR is pretty important, along with other factors at play. But it seems to me, building authority (to individual pages)  through links and social media is an even bigger deal than you thought, huh?

How Crawl Budget Affects Organic Search Traffic

Not directly.

Just because you have a big, fat crawl budget doesn’t mean you’ll get scads of organic search traffic. I compared the same 40 sites’ crawl budgets and percentage of traffic from organic search, and got this:

Crawl budget versus organic traffic

Um. OK. That’s not exactly a powerful correlation.

Then I compared crawl budget to raw organic traffic (not a percentage) and got this:

Crawl budget versus raw organic traffic

That might imply a correlation between crawl budget and organic traffic. But it also might just mean sites with higher authority get more organic traffic. Which hints at a relationship between crawl budget and traffic, but hardly confirms it.

Apparently, a higher crawl budget doesn’t even equal more pages in the Google index:

Crawl budget versus pages indexed

Again, there’s some correlation, but I’d expect a lot more given how fast crawl budget goes up with higher PageRank.

If you think about it carefully, though, it makes sense: Crawl budget does not equal lots of organic traffic. It doesn’t equal lots of indexed pages. It merely equals Googlebot visiting many URLs on your site. (A word of caution: more crawling isn’t necessarily better if the high rate of crawl is due to crawl inefficiency; for example, canonicalization problems could result in high crawl, but low indexing.)

You need to use crawl budget wisely if you want it to translate to higher rankings and more organic traffic.

Making The Most Of Your Crawl Budget

It’s not the size of your crawl budget. It’s how you use it.

I actually wrote this entire article just so I could use that line.

Punny, but true. If Googlebot crawls thousands of pages of your site but finds that 90% of those pages are content-poor, duplicates or so slow they time out, then your crawl budget is the SEO equivalent of Monopoly money.

To make the most of your crawl budget, do all the stuff you should be doing anyway:

  1. Remove duplicates. Don’t make Googlebot crawl 3 versions of every page. Rel=canonical won’t fix this! Googlebot still has to crawl canonically incorrect pages, even if you mark them as such. Fix duplicate content issues, once and for all, so Googlebot doesn’t waste crawl budget reading the same content multiple times.
  2. Fix broken links. Redirection is a good last resort, but wherever possible, really fix 404 errors.
  3. Speed up your pages. Follow best practices for fast-loading pages: Make sure you’re hosted on a fast server with plenty of resources; remove embedded Javascript and CSS; write efficient code; use tools like Google Pagespeed and YSlow. Anything you can do to reduce page load times will help.
  4. Don’t write ‘thin’ pages. Avoid having lots of pages on your site that have zero or very little real text. These pages are very hard for Google to classify, and can’t contribute much to your site’s overall relevance. Plus, they’re unlikely to rank in the organic results. That often makes them a poor expenditure of crawl budget.
  5. Generate an XML sitemap. Make sure Google knows where all your pages are.
  6. Check your log files. Learn to look for signs that Google or Bing are getting hung up in piles of duplicate or slow-loading pages. Troubleshoot and fix those problems.
  7. Build authority. Duh. Of course you should build authority! But don’t focus purely on your home page. Be sure to build links to deeper site pages, too. A few deep links can pump up the PageRank of deeper site pages and have a big effect on average PR.
  8. Use smart site architecture. Build your site with faceted navigation. Minimize the number of links per page. This will maximize your average PageRank and increase your crawl budget.

Crawl Budget = Opportunity

Your crawl budget is currency. You use it to ‘buy’ time that Googlebot can spend visiting your site, grabbing URLs and attempting to add them to its index.

How you spend that currency is up to you. Time spent maximizing the value Google gets per page crawled will definitely pay off.

Note: This Is All Relevant To Google

I wish I could provide information about Bing’s algorithm for setting crawl budget, but I can’t. There are some direct quotes from Googlers, including Eric Enge’s great interview with Matt Cutts. And, I’ve got lots of log file data to look at.

Bing, not so much. While it’s safe to assume that Bing uses a similar budgeting system, Bing has also stated they’re trying to get away from domain scoring as part of their ranking algorithm.

It’s safe to say that Bing uses a similar budgeting concept, but I can’t guarantee it. Worst case, though, suggestions in this article will have no effect on the crawl budget Bing assigns your site.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: All Things SEO Column | Channel: SEO

Sponsored


About The Author: is Chief Marketing Curmudgeon and President at Portent, Inc, a firm he started in 1995. Portent is a full-service internet marketing company whose services include SEO, SEM and strategic consulting.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • Ian Lurie

    I got some great feedback from other SEOs right before this article went live, and wanted to put some extra info in a comment:

    1. I talk about crawl budget a lot, but the most important part of this article, really, is ‘making the most of your crawl budget’. That’s what Vanessa Fox refers to as ‘crawl efficiency’. If you have a huge crawl budget but 90% of it is sucked up by duplicate content and such, then you won’t see much benefit.

    2. Search engines apply PageRank to individual pages, not to entire sites. However, I definitely see evidence that they look at some version of average PageRank, or domain authority, or whatever you want to call it, site-wide. That appears to impact crawl budget. There may not be a causal link – it may simply be that sites with lots of pages that all have higher PageRank get higher crawl budgets as a side effect of deep links to individual site pages.

    This is a very deep, geeky topic – I hope I don’t ruffle any feathers too badly, but I also hope that a lot of people smarter than I will pick up the discussion, so we can all learn something.

    Thanks,

    Ian

  • http://www.linkedin.com/in/brianrbrown Brian R. Brown

    Ian, great article…crawl equity is one of my favorite geeky search topics! Nice analysis and I agree pretty much across the board….however, #8 around faceted navigation can be a slippery slope. Like many things related to SEO, faceted or guided navigation is neither good nor bad for SEO (or crawl equity), but rather how it is employed can greatly influence this.

    What I think you were driving at was SEO-friendly guided navigation that helps provide high-value, contextually rich content pages, while also reducing levels of duplication and pagination. Bad faceted navigation can of course create a near infinite crawl path, killing crawl equity.

  • Ian Lurie

    Hi Brian,

    Thanks! I totally agree on navigation. Obviously, if you have a beautiful hammer and just use it to squash your thumb, it doesn’t do a lot of good. Use faceted navigation the way you use any other tool – wisely.

    Ian

  • http://www.michael-martinez.com/ Michael Martinez

    The is a good first step in an interesting direction but you’ll need to take (number of pages on site), (number of pages crawled), and (number of pages indexed), and/or (number of pages receiving search traffic) into consideration before you can provide any meaningful correlations.

  • Ian Lurie

    Plus, to really make it work, we need a much larger sampling. If anyone wants to volunteer their log files…

  • http://www.jaankanellis.com Jaan Kanellis

    Ian for page authority did you use the home page number from each SEO tool?

  • Ian Lurie

    Jaan I actually averaged PageRank across the top 20% of pages on a given domain.

  • http://www.linkedin.com/in/matteosutto matteo sutto

    Great article Ian

    Regarding your correlation analysis between page crawled and PR, wouldn’t be more meaningful to compare the ratio “page crawled by Google to total pages existing on a domain” to the PR of the site ?
    I suspect there might be a correlation between the PR of a site and the total number of pages on a domain, thus biasing the correlation between page crawled and PR

  • http://www.3sMarketingTeam.com Al

    Wouldn’t a blog post be considered a thin page and as a result not rank well, although blog posts are notorious for ranking easily.

  • Ian Lurie

    Matteo – You’re describing crawl efficiency, which is what I talk about in the ‘making the most’ section. It really depends more on how you use the crawl budget you’ve got.

    It’s POSSIBLE that I’ve got the causation backwards, but I don’t think so. Testing with a bigger sample will prove it at some point.

  • Ian Lurie

    Al – I don’t think so. Blog posts tend to be fairly content-heavy, actually. And a well-tended blog will grow and accumulate links continuously, so if anything, a good blog would perform far better over time.

  • http://alexavery.com.au/seo-ppc-analytics-services/ Alex Avery

    If that’s “Nerd Alert” material, then ring my bell Ian.

    Like many of the comments (are trying to) say above, don’t be afraid to go deeper or get Nerdier. That’s what we’re here for. :)

    Keep up the good work.

  • http://www.greenlaneseo.com/blog Bill Sebald

    Cool article.

    “Crawl budget does not equal lots of organic traffic. It doesn’t equal lots of indexed pages. It merely equals Googlebot visiting many URLs on your site.”

    Inherently it may not guarantee any of that, but for some (especially webmasters/SEOs who like to tinker/add onto existing content on large sites like me), it can lead to a wider net of new rankings, newfound authority, and circle back to more PR and gasoline in the spiders’ tank. For me, That’s more than enough of a reason to focus on it. Personally I don’t need an exhaustive experiment.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide