Crawling and indexing issues can really put a damper on your efforts to rank well for a variety of competitive and non-competitive terms. Solving indexing issues is definitely an important step towards increasing your keyword ranking footprint. And, the more real estate you own by solving those issues, means you’re taking the next steps in realizing your long-tail efforts.
Before I start with a problem site, I take a few measurements. Whenever you’re trying to solve a problem, you need to figure out where you’re starting from to make sure you’re helping and not hurting your efforts. I start by pulling a variety of data:
- Note the number of submitted and indexed pages from Google Webmaster Central. Generally, you need to submit a sitemap to get this data, and if you’re having indexing problems, this can really help. That being said, I don’t always submit an XML sitemap to Google. There are times to do it, and times to not worry about it. If I haven’t submitted a sitemap to Google Webmaster Tools, I do a site:domain.com query in Google and look at the number of pages listed initially. By paging through all of the results of the site: query, you’ll see that number reduce as you get to the “end.” This is the true number of pages Google has in the index showing as query results.
- Run an SEMRush report to measure how many organic top-20 rankings I have.
- Use a crawling program to pull a realistic page list from my site. Once I have this list I move through and eliminate any pages I don’t want in the index and figure out how I have or should have eliminated them, either by no index tags, roboots.txt exclusion, or parameter exclusion in Webmaster Tools (see more about parameter exclusion below).
NOTE: If my number of indexed pages matches my total page count pretty closely, I’m not really looking at a problem. The reality is – the larger the site, the more problems I generally find.
After I’ve pulled the baseline data, I look at all of my URLs and start comparing them to the list of indexed pages, and note the ones that are not indexed. Once I have a true grasp on the pages that are not in the index, I start navigating the site to those pages. I move through the site to the un-indexed pages and try to determine why they’re not being crawled. As you perform this step on your own site, look at the links, the navigation, urls, and navigation as you go through.
Generally speaking, I can solve the problems with indexing by finding remedies for at least one of the following problems:
- Robots.txt file excludes that page/folder by mistake. For whatever reason, you may haveexcluded a folder or section of the site during design or testing, and forgot to remove that restriction in your robots.txt file. This is probably the easiest thing to check and fix, and if you’re really lucky, that’s all it takes.
- URLs contain excluded parameters. As with the robots.txt file, you can tell Google to ignore specific parameters if those parameters contain duplicate content. One indexing issue may be3 caused by asking Google to exclude a parameter you actually wanted included. Check your webmaster tools account under site configuration, settings, Parameter Handling.
- Content that is inadequate or duplicate of other pages. Many sites contain duplicate content. In some cases, this is inevitable and even okay. The key to controlling duplicate content issues is to tell Google which page you want them to read, and how to get there. Use your robots.txt and parameter handling to tell Google which pages to ignore, and make sure the bots can get into the page you want them to index easily. Not all pages with content problems are duplicate; sometimes the content is just inadequate for ranking.
The goal of content is to tell the user and the search engine what that page is about. If there’s no text on the page, or very little, the search engines don’t know how to rank that page. Having a page title and a meta description isn’t enough; the content on the page has to support the keywords in your titles and descriptions or you might as well not even have them.
- Not enough inbound links to trickle interest deep into the site. Links are the bread and butter of search engine rankings. If you don’t have a healthy number of links coming into your site, there’s not enough link “juice” to trickle down into those interior pages. Ideally, you’ve build a good number of quality links into interior pages of the site to help push that juice deep into the site. Keep in mind, the larger the site, the more quality links you need to support high rankings. While bringing links into a site is important, being careful not to bleed a ton of page rank with off-site links is also important. Consider your website to be like a kitchen colander, the more holes it has (outbound links) the faster the juice drains out.
- No link to that page exists on an indexed/cached page. Having either a good contextual link or navigational link to those deep pages is key to getting the search engine boxes in there. Believe it or not, sometimes links gong to deep pages are graphical or buttons, and in many cases those are not being followed by the bots. Although this sounds like a simple thing to check, depending upon the number of un-indexed URLs you’re digging through it can take a lot of time to review.
- Navigation that cannot be indexed. Every once in awhile I run across navigation that is built in flash or other solutions that don’t provide indexable links to interior pages. Honestly, you can solve this by building contextual links to your interior pages, but I would probably look at an overhaul off the site to help with this problem. Good websites have indexable and contextual navigation links, in my opinion.
Figuring out and fixing your indexing issues is definitely worth the time you invest into it, but it also takes a variety of approaches. Start with the easy steps, and progress into the hardest ones to complete. I listed the steps above in order of effort involved to check and solve, so start at the top and work your way down. Once you’ve found your issue, start watching for improvement in the metrics you baselined at the beginning of your efforts.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.