Search Engine Land » SEO » Search Engines Don’t Like You? Don’t Jump To Conclusions

Search Engines Don’t Like You? Don’t Jump To Conclusions

One of the most frustrating things about technical problems with a site is that the ways they show up in search engines are usually unexpected or subtle. What looks like a penalty can actually be a problem introduced with a new version or new feature of a website. Because the true causes of problems like […]

Todd Nemet on May 19, 2011 at 1:15 pm | Reading time: 6 minutes

One of the most frustrating things about technical problems with a site is that the ways they show up in search engines are usually unexpected or subtle. What looks like a penalty can actually be a problem introduced with a new version or new feature of a website.

Because the true causes of problems like these are usually not at all obvious, they can lead to hypotheses that border on the paranoid (“Google doesn’t like my site,”) or wild speculation: (“I was put in the sandbox and then hit with Panda. I call it the Pandbox.”).

Since Google isn’t alive and doesn’t have emotions (yet), we can safely set aside (for now) any search engine anthropomorphizing and focus on finding root causes that may be lurking in the site’s technical infrastructure.

Symptoms: Fewer Pages In The Index, Drop In Long Tail Traffic

The main causes for problems with site coverage include duplicate content, allowing pages with no SEO value to be crawled, and network problems.

Duplicate content occurs when you can get to a page through multiple URLs.

Sometimes this is caused by having an entire copy of a site available on another subdomain, like https://www1.yoursite.com/, or on an IP address, like https://192.168.1.1/.

Duplicate content can also happen at the page level, when a page is available at multiple URLs like this:

Both types of duplicate content reduce the number of pages in the index because search engines are wasting their time crawling multiple copies of a website or a page.

Search engines throw away these extra copies because there is no point in including redundant pages in the index. This means that time spent crawling more pages on your site was wasted crawling extra copies of pages that won’t be used anyway.

For the example pages above, that site would have to be crawled at least five times to get each page of the site.

If you have a duplicate site, you can use a 301 to permanently redirect any visitors to the main site.

Fixing duplicate content at the page level is a bit tricker.

Select one canonical URL from each set of potential duplicate URLs and make sure that each duplicate URL permanently redirects to the canonical one. If this isn’t possible – for example, due to tracking parameters like referral_id=1 above – use a link rel=canonical tag that points to the canonical URL and configure Bing and Google webmaster tools to ignore the appropriate parameters.

Diagnosing Crawl Inefficiencies

Allowing pages with no value to be crawled means that the search engines are spending valuable resources crawling things like API calls, log files, or pages with an infinite number of combinations like a web calendar.

Similar to duplicate content, crawl inefficiency means that search engines are crawling useless pages, at the expense of pages that you would like crawled.

These zero-value pages aren’t going to lead to any conversions, assuming that they are even indexed by search engines or rank well for anything.

To fix these types of problems, use the robots.txt file to exclude these types of pages. Be sure to test any changes to your robots.txt file in Google Webmaster Tools before pushing them live.

Networking problems can be very elusive. Most of the networking problems I have seen involve either load balancing or DNS.

Load balancing is used on larger sites to spread web requests among a number of back end servers. Sometimes it is misconfigured in a way in which most of the crawler requests go to one backend server, which eventually slows to a crawl.

DNS problems can make a website unnecessarily slow for first time visitors or in extreme cases, make it intermittently unavailable.

You can easily check your DNS configuration with an on-line tool like IntoDNS. Checking the load balancers or other aspects of the back end network is not so easy, so it’s probably best to ask a network engineer about any recent changes to the infrastructure.

Symptoms: Wrong Pages Ranking, Decline In Ranking

These symptoms are usually caused by duplicate copies of important pages or by search engines not being able to understand the linking structure of your site.

Duplicate content can have a negative effect on ranking because inbound links to a particular page – a very important signal for search engines – are spread out among different URLs. As a result, the search engine is only aware of the number of inbound links for the one copy of the page that it decides to keep.

Make sure that all of the intended inbound links count towards the page by fixing these duplicate URLs as described above.

Another important signal for search engines is how a page is linked within a site. For example, a page with a link from the homepage will be considered a more important page than a page that is orphaned on the site with no links.

Coding site navigation elements in Flash, Silverlight, or JavaScript can make it impossible for search engines to extract these links. As a result, they are missing key information about what pages on a site are the most important.

Investigate Before You Make Assumptions

This is not a complete list of root causes for indexing issues and traffic loss, but it does contain the most common issues that I have seen with sites that I have been asked to review.

Other causes of similar symptoms are page speed, cache unfriendliness, internationalization issues, server misconfigurations, and security vulnerabilities. Each one is worthy of an article in itself.

I hope this provides some additional ideas of where to hunt down causes of particularly vexing problems with the way your site is performing in search.

Fortunately, it is much easier to redirect a duplicate copy of your site or fix a DNS misconfiguration than it is to influence Google or Bing’s algorithms.

While search engines definitely penalize some sites and it is possible for a site to get caught up in algorithm changes, make sure you have thoroughly reviewed your technical architecture before jumping to any conclusions about what search engines don’t “like” about it.

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.

Add Search Engine Land to your Google News feed.