Pro Tip: How to fix 3 not so obvious crawl errors

Index bloat impacts performance as well as misconfigured trailing slashes and soft 404s.

Chat with SearchBot

Everyone hates crawl errors. They show up without warning and can cause indexing issues.

In Gary Illyes’ (Google Webmaster Trends Analyst) Reddit AMA last year, he explained you must make your site crawlable:

“I really wish SEOs went back to the basics (i.e. MAKE THAT DAMN SITE CRAWLABLE) instead of focusing on silly updates and made up terms by the rank trackers, and that they talked more with developers…”

These tips will show you how.

How to find and fix index bloat

Index bloat means you have more URLs indexed than physical pages.

If it’s on a large enough scale, it can negatively impact performance. If severe enough, it’s a waste of your crawl budget.

Use the site: operator in Google search to find it. If the number of results is larger than the number of URLs you have, it’s an issue. Don’t include a space.

The operator should be entered into Google like this:

Image1 8

Misconfigured 4xx errors and soft 404s

With normal 404s, 301 redirecting them to working URLs is a good solution. What if 404s are not the normal 404s, though?

It’s a common issue. A page without content is a soft 404, even if it shows a 200 OK status.

In Screaming Frog, default word count reflects every single word on the page, not just the main content area. You must use Excel to determine “no content” after exporting your crawl data.

Create a column in Excel next to Screaming Frog’s standard word count, and subtract the total word count of your headers and footers (any sidebars, other text, etc.) from the total word count displayed.

Image3 5
Image2 7

The following method is more reliable but time-consuming to manually examine your pages to find physical text content.

Misconfigured trailing slashes

Not all URLs are created equal. There is a difference between .htm, .html, and using a forward slash (/). The first two are file names. The last is a folder.

When all load at once, you’re serving three URLs with the same content.

Serving multiple indexable versions leads to crawl errors and duplicate content issues.

If this issue exists on your site already, redirect all URL versions to one primary version, so only one version loads.

Leaner is better

Don’t just go for more content, not caring about these details. They’re important to your site. Create a better, leaner site with fully optimized crawlability. Your users will thank you.

Pro Tip is a special feature for SEOs in our community to share a specific tactic others can use to elevate their performance. You can submit your own here.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Brian Harnish
Contributor
Brian started in web development in 1998 when search engines like Altavista and Yahoo ruled the industry, and Google was just getting started. In 2007, he made his leap into SEO professionally. He has performed SEO for law firms, real estate agents, technology, and healthcare sectors. His work also includes large brands like United Healthcare and Microsoft.

Get the must-read newsletter for search marketers.