Crawling and indexing issues can really put a damper on your efforts to rank well for a variety of competitive and non-competitive terms.  Solving indexing issues is definitely an important step towards increasing your keyword ranking footprint. And, the more real estate you own by solving those issues, means you’re taking the next steps in realizing your long-tail efforts.

Before I start with a problem site, I take a few measurements.  Whenever you’re trying to solve a problem, you need to figure out where you’re starting from to make sure you’re helping and not hurting your efforts. I start by pulling a variety of data:

  1. Note the number of submitted and indexed pages from Google Webmaster Central. Generally, you need to submit a sitemap to get this data, and if you’re having indexing problems,  this can really help. That being said, I don’t always submit an XML sitemap to Google. There are times to do it, and times to not worry about it. If I haven’t submitted a sitemap to Google Webmaster Tools, I do a site:domain.com query in Google and look at the number of pages listed initially. By paging through all of the results of the site: query, you’ll see that number reduce as you get to the “end.”  This is the true number of pages Google has in the index showing as query results.
  2. Run an SEMRush report to measure how many organic top-20 rankings I have.
  3. Use a crawling program to pull a realistic page list from my site.  Once I have this list I move through and eliminate any pages I don’t want in the index and figure out how I have or should have eliminated them, either by no index tags, roboots.txt exclusion, or parameter exclusion in Webmaster Tools (see more about parameter exclusion below).

NOTE: If my number of indexed pages matches my total page count pretty closely, I’m not really looking at a problem. The reality is – the larger the site, the more problems I generally find.

After I’ve pulled the baseline data, I look at all of my URLs and start comparing them to the list of indexed pages, and note the ones that are not indexed. Once I have a true grasp on the pages that are not in the index, I start navigating the site to those pages. I move through the site to the un-indexed pages and try to determine why they’re not being crawled. As you perform this step on your own site, look at the links, the navigation, urls, and navigation as you go through.

Generally speaking, I can solve the problems with indexing by finding remedies for at least one of the following problems:

  • Robots.txt file excludes that page/folder by mistake. For whatever reason, you may haveexcluded a folder or section of the site during design or testing, and forgot to remove that restriction in your robots.txt file. This is probably the easiest thing to check and fix,  and if you’re really lucky, that’s all it takes.
  • URLs contain excluded parameters. As with the robots.txt file, you can tell Google to ignore specific parameters if those parameters contain duplicate content.  One indexing issue may be3 caused by asking Google to exclude a parameter you actually wanted included.  Check your webmaster tools account under site configuration, settings, Parameter Handling.
  • Content that is inadequate or duplicate of other pages. Many sites contain duplicate content. In some cases, this is inevitable and even okay. The key to controlling duplicate content issues is to tell Google which page you want them to read, and how to get there. Use your robots.txt and parameter handling to tell Google which pages to ignore, and make sure the bots can get into the page you want them to index easily. Not all pages with content problems are duplicate; sometimes the content is just inadequate for ranking.

    The goal of content is to tell the user and the search engine what that page is about. If there’s no text on the page, or very little, the search engines don’t know how to rank that page. Having a page title and a meta description isn’t enough; the content on the page has to support the keywords in your titles and descriptions or you might as well not even have them.

  • Not enough inbound links to trickle interest deep into the site. Links are the bread and butter of search engine rankings. If you don’t have a healthy number of links coming into your site, there’s not enough link “juice” to trickle down into those interior pages. Ideally, you’ve build a good number of quality links into interior pages of the site to help push that juice deep into the site. Keep in mind, the larger the site, the more quality links you need to support high rankings. While bringing links into a site is important, being careful not to bleed a ton of page rank with off-site links is also important. Consider your website to be like a kitchen colander, the more holes it has (outbound links) the faster the juice drains out.
  • No link to that page exists on an indexed/cached page.  Having either a good contextual link or navigational link to those deep pages is key to getting the search engine boxes in there.  Believe it or not, sometimes links gong to deep pages are graphical or buttons, and in many cases those are not being followed by the bots. Although this sounds like a simple thing to check, depending upon the number of un-indexed URLs you’re digging through it can take a lot of time to review.
  • Navigation that cannot be indexed. Every once in awhile I run across navigation that is built in flash or other solutions that don’t provide indexable links to interior pages. Honestly, you can solve this by building contextual links to your interior pages, but I would probably look at an overhaul off the site to help with this problem. Good websites have indexable and contextual navigation links, in my opinion.

Figuring out  and fixing your indexing issues is definitely worth the time you invest into it, but it also takes a variety of approaches. Start with the easy steps, and progress into the hardest ones to complete. I listed the steps above in order of effort involved to check and solve, so start at the top and work your way down. Once you’ve found your issue, start watching for improvement in the metrics you baselined at the beginning of your efforts.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: Search Marketing | Search Marketing Toolbox

Sponsored


About The Author: is the co-founder of Ignitor Digital, along with long-time colleague Mary Bowling. At Ignitor, Carrie tackles tough technical SEO roadblocks many small business owners don't even know they have. Her experience with analytics and troubleshooting helps her get to the root of issues. When not working, Carrie loves to cook for friends and family, hang out with her pretty awesome kids, and read books that have little-to-no educational value! You can also follow Carrie on twitter, @carriehill.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://incrediblehelp incrediblehelp

    Is this process realistic with sites that have over 100k of pages?

  • http://www.blizzardinternet.com Carrie Hill

    Hi Incrediblehelp

    I’ve never done this for a site that large, I think the largest was around 25k urls – and that site was doing well but there were some sub-folders that were having issues so i could break the site down even further.

    There are definitely things you can do to help w/ large-site indexing issues here:
    *Check your Robots.txt to be sure you’re not including something important
    *Check your parameter handling in Webmaster Tools
    *Build more incoming links to deep pages of the site (sort of a “duh” one – but still something that should be done)
    *Make sure there’s a link every page that you want in the index that is followable by the spiders

    Large sites of 50k, 100k domains or more dont allow you to look at every single URL – but you should still be cognizant of your footprint and what sections or directories of the site arent indexing, cacheing and performing well – these steps can help you fix those issues.

    Hope this helps :)
    ~Carrie

  • wp themedesk

    I have started new website which was indexing properly but since few days some links are not being index and others are indexed with homepage URL in google search result, and some times back it was properly indexed with post url when search in google, may i have some suggestion to this problem ? my site is  http://www.wordpressthemedesk.com and using yoast wordpress plugin.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide