Search Engine Land » SEO » Understanding The SEO Challenges Of Language Detection

Understanding The SEO Challenges Of Language Detection

Last time, I reviewed how to effectively manage multilingual content segmentation by looking at ways to use directories, parameters and other methods to optimize local market content. Once we have our content sorted, the next challenge we have is how to direct users—and more importantly, search spiders (crawlers)—to this content. The purpose of this article […]

Bill Hunt on August 3, 2010 at 6:00 am | Reading time: 6 minutes

Last time, I reviewed how to effectively manage multilingual content segmentation by looking at ways to use directories, parameters and other methods to optimize local market content. Once we have our content sorted, the next challenge we have is how to direct users—and more importantly, search spiders (crawlers)—to this content.

The purpose of this article is not to argue the user experience or even the philosophical issues of your method of matching visitors to specific country or language content, but rather to ensure that whatever means you choose, you do not negatively impact your search performance.

The problem with language detection and redirection

So what’s the big deal? It makes sense that a person from Germany gets redirected to our German content, doesn’t it? While this might be a convenience to the visitor, it could also mean banishment into the nether regions of a site, or never getting out of English content. Most of the common detection methods looks for signals from the visitor to decide where to send them. Search spiders don’t send the same signals as a typical human visitor to a site. This means crawlers are often only “allowed” to receive “default” language content. In many cases, this is English or US-centric content.

Let’s review some of the more common detection and redirection processes used by sites and their negative implications on search spiders.

Dynamic detection and redirecting

With dynamic detection and redirecting, a site uses either server level actions or scripting and a decision tree to route visitors (or crawlers) to content. The detection and subsequent routing is based either on the visitor’s IP location or a preference (such as language) set in the browser.

IP location. This is by far the most common method used on sites large and small. This method simply serves up content based on the IP location of the visitor. For example, if the IP location shows the visitor accessed the site while in Munich, Germany, the server would serve them either Germany-specific or German language versions of the site.

Browser language preference. This approach is very common but has a lot of problems when all the details are not sorted properly. With browser preference, the detection script looks at the visitor’s language preference submitted to the server via the browser’s request for the page. This contained in the “http_accept_language” variable.
For example, a person who lives in France using the French language version of Firefox will have the default language set to France/French, which would send the variable “http_accept_language_fr-fr” to the web server. In a perfect scenario, the script would detect both the French language preference and the France country designator and then route them to the France version of the site and render the content in French.

Both of these methods are problematic for search engines, because spiders often crawl from a specific location and don’t signal language preference in their server request. For example, if Googlebot, crawling from Mountain View in California, requested a German language page on a site using IP detection the web server would detect an request from a IP in the U.S. and the crawler would be routed to the U.S. version and potentially never see the German language version. The same scenario on a site with browser detection would not detect any language preference and thus route the spider to the default version of the site which is typically English for US companies and the local language version for many country installations of scripts and web servers.

The most reliable test to determine if this is happening is to ask employees in local markets to visit a site and see if they’re redirected to anything other than the main global home page. If they’re presented a local country version, then the site uses dynamic language detection. They should also test this by changing their user agent for the various search engines and browser types to ensure the widest set of variables.

The easiest search workaround for either of these detection methods is to simply determine if the requester is a search engine and exempt them from any redirection, giving them the page they want. Note I did not say redirect them or any other action that could be misconstrued by conspiracy theorists as cloaking but simply let the spider have the page it requested. This will ensure spiders can index your local language content.

Country/location site maps and selectors

If you have more than a single country or language version of your site, and your site hasn’t deployed one of the dynamic means of redirection described above, then we would assume you’ve implemented a manual solution. The country selector will make it easy for a visitor to select the county or language of their choice. Some multinational sites force the visitor to select the version of the site they want to visit from a large list of countries or a map when you arrive at the home page.

This is problematic, because search engines can’t “see” the image or know where to click, thereby preventing them from entering the site. Yes, they could follow all of the links into the site (good) except in many cases these links are rendered as a JavaScript popup with Flash or, more recently, using AJAX or Flex coding techniques. That method will prevent the spiders from seeing the destination URLs, because they’re managed on the server side instead of the client side. Also, make sure you don’t require a cookie to remember a user’s preference.

Using a pull-down menu at the top of the page to select a country is becoming the most common option for presenting local content options of a global site. This approach allows an organization to leverage the primary domain on a global basis. If a visitor wants to select a different country or language, they can easily pull down the list and select the one they want. Many sites will cookie the selection to direct the visitor on their next visit. Note, if country/language cookies are set, ensure you accommodate visitors and spiders that won’t accept cookies and not trap them to a default version of the site.

IBM’s global portal is an excellent example, representing nearly 100 language/country options. Using a selector and a cookie to remember your choice, they have effectively integrated relative URLs for each country site option that makes it easy for search spiders to detect all destination pages and access all of the local market content.

No matter what method of language or country detection you choose, the key is to make sure that it is not only user friendly but accommodating to spiders. As they say, “the devil is in the details” and you really need to test all of the error traps and redirection variables to ensure you are not sending the spiders or users into the wrong direction, resulting in significant portions of your content missing from the search engines.

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.

Add Search Engine Land to your Google News feed.