Understanding The SEO Challenges Of Language Detection

Last time, I reviewed how to effectively manage multilingual content segmentation by looking at ways to use directories, parameters and other methods to optimize local market content. Once we have our content sorted, the next challenge we have is how to direct users—and more importantly, search spiders (crawlers)—to this content.

The purpose of this article is not to argue the user experience or even the philosophical issues of your method of matching visitors to specific country or language content, but rather to ensure that whatever means you choose, you do not negatively impact your search performance.

The problem with language detection and redirection

So what’s the big deal? It makes sense that a person from Germany gets redirected to our German content, doesn’t it? While this might be a convenience to the visitor, it could also mean banishment into the nether regions of a site, or never getting out of English content. Most of the common detection methods looks for signals from the visitor to decide where to send them. Search spiders don’t send the same signals as a typical human visitor to a site. This means crawlers are often only “allowed” to receive “default” language content. In many cases, this is English or US-centric content.

Let’s review some of the more common detection and redirection processes used by sites and their negative implications on search spiders.

Dynamic detection and redirecting

With dynamic detection and redirecting, a site uses either server level actions or scripting and a decision tree to route visitors (or crawlers) to content. The detection and subsequent routing is based either on the visitor’s IP location or a preference (such as language) set in the browser.

IP location. This is by far the most common method used on sites large and small. This method simply serves up content based on the IP location of the visitor. For example, if the IP location shows the visitor accessed the site while in Munich, Germany, the server would serve them either Germany-specific or German language versions of the site.

Browser language preference. This approach is very common but has a lot of problems when all the details are not sorted properly. With browser preference, the detection script looks at the visitor’s language preference submitted to the server via the browser’s request for the page. This contained in the “http_accept_language” variable. For example, a person who lives in France using the French language version of Firefox will have the default language set to France/French, which would send the variable “http_accept_language_fr-fr” to the web server. In a perfect scenario, the script would detect both the French language preference and the France country designator and then route them to the France version of the site and render the content in French.

Both of these methods are problematic for search engines, because spiders often crawl from a specific location and don’t signal language preference in their server request. For example, if Googlebot, crawling from Mountain View in California, requested a German language page on a site using IP detection the web server would detect an request from a IP in the U.S. and the crawler would be routed to the U.S. version and potentially never see the German language version. The same scenario on a site with browser detection would not detect any language preference and thus route the spider to the default version of the site which is typically English for US companies and the local language version for many country installations of scripts and web servers.

The most reliable test to determine if this is happening is to ask employees in local markets to visit a site and see if they’re redirected to anything other than the main global home page. If they’re presented a local country version, then the site uses dynamic language detection. They should also test this by changing their user agent for the various search engines and browser types to ensure the widest set of variables.

The easiest search workaround for either of these detection methods is to simply determine if the requester is a search engine and exempt them from any redirection, giving them the page they want. Note I did not say redirect them or any other action that could be misconstrued by conspiracy theorists as cloaking but simply let the spider have the page it requested. This will ensure spiders can index your local language content.

Country/location site maps and selectors

If you have more than a single country or language version of your site, and your site hasn’t deployed one of the dynamic means of redirection described above, then we would assume you’ve implemented a manual solution. The country selector will make it easy for a visitor to select the county or language of their choice. Some multinational sites force the visitor to select the version of the site they want to visit from a large list of countries or a map when you arrive at the home page.

country_map

This is problematic, because search engines can’t “see” the image or know where to click, thereby preventing them from entering the site. Yes, they could follow all of the links into the site (good) except in many cases these links are rendered as a JavaScript popup with Flash or, more recently, using AJAX or Flex coding techniques. That method will prevent the spiders from seeing the destination URLs, because they’re managed on the server side instead of the client side. Also, make sure you don’t require a cookie to remember a user’s preference.

Using a pull-down menu at the top of the page to select a country is becoming the most common option for presenting local content options of a global site. This approach allows an organization to leverage the primary domain on a global basis. If a visitor wants to select a different country or language, they can easily pull down the list and select the one they want. Many sites will cookie the selection to direct the visitor on their next visit. Note, if country/language cookies are set, ensure you accommodate visitors and spiders that won’t accept cookies and not trap them to a default version of the site.

ibm_country_selector

IBM’s global portal is an excellent example, representing nearly 100 language/country options. Using a selector and a cookie to remember your choice, they have effectively integrated relative URLs for each country site option that makes it easy for search spiders to detect all destination pages and access all of the local market content.

No matter what method of language or country detection you choose, the key is to make sure that it is not only user friendly but accommodating to spiders. As they say, “the devil is in the details” and you really need to test all of the error traps and redirection variables to ensure you are not sending the spiders or users into the wrong direction, resulting in significant portions of your content missing from the search engines.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEO | Multinational Search

Sponsored


About The Author: is currently the President of Back Azimuth Consulting and co-author of Search Engine Marketing Inc. His personal blog is whunt.com.

Connect with the author via: Email | Twitter



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://joblr.net Mikkel deMib Svendsen

    Good article, as always, Bill :)

    However, to you “engine detection” solution for automated language redirects there is one big problem …

    If you let Google index all language versions but redirect users to the one you think is right automatically you may end up with very upset search users.

    lets say I am in Germany,as I often am in fact, and I do a search for something – which off course I do, I will usually want something in english (my German is really bad).

    So if Google got access to your english pages I would find them. But then when I click on the link to your site it detects that I am in Germany and forces me to that page. Thats not what I clicked on. Thats not what is in the Google cache. To me its cloaking. I am not allowed to see what Google found and link me to.

    Maybe I’ll report it. Maybe Google will deal with it. Maybe not. But its definately a risk to any site owner that employs this technique :)

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide