Are Your Language Detection Methods Blocking Search Engine Crawlers?

At a recent international search marketing conference in London the most frequent question asked by the audience was “How do I get my content found and indexed by global and local search engines?”

During the breaks I talked to a few people who indicated little or none of their local market content was being indexed by the major engines. Close examination of these sites revealed that they were all using some form of language detection. In two cases they were doing language detection because they saw Google doing it and assumed that this was the best approach.

However, if you are like me and travel a lot to other countries, you know that assumption can lead to a big problem: Just because I am physically located in a particular country doesn’t necessarily mean that its native language is the content I wish to see. There are other factors that come into play, such as my browser default language, the language I use for queries and so on.

Lets take a deeper look at dynamic language and location detection and explore some of the things you should do to make the process work better.

What is your default dynamic language response?

Browser level language detection is the most popular method of determining a language preference. Your web server simply looks at the visitor’s language preference submitted to the server via “accept-language header” and then locates and serves any content that contains that language code. For example, a person who downloads the French-language version of Firefox will typically have their default language preference set to “French” or “French_France.” When they visit your site the server will read the preference and automatically redirect the visitor to the French version of the site.

While using the accept-language header can be a good starting point for determining the language of the user, it is often misused to “assume” their location. While there can be many advantages to determining a searcher’s language preference to serve them local content, determine local currency, or even format phone numbers that might be more suitable for visitors, there are also potentially catastrophic implications for your search marketing program if you default to browser language preference without considering other factors.

The problem for search marketers is that most search engine crawlers do not use the “accept-language header” and therefore are not sending a language preference. Because crawlers do not send a preference when requesting pages, they are served the “default” language of the server. Do you know the default language of your server? Many web servers, especially .NET and IIS Server, by default, will serve English as the default, meaning search engine crawlers will only see the English language version of your site, regardless if you have tons of content in other languages.

Is your IP detection default location keeping crawlers from finding your local content?

All of those problem sites I encountered at the conference were using IP location detection scripts. Essentially, these scripts receive the IP location of the site visitor and serve them predetermined content based on the county and/or city where the visitor has connected to the Internet. For example, I am writing this article while in Berlin Germany. When I go to, the server detects I am in Germany and routes me to and presents the home page in German. For me, a native English speaker, I have to take steps to counter this and select Google in English to get to the content that I need. This is a major problem for search engine crawlers.

The problem is, most search engine crawlers crawl from a specific country location. While I have seen Google crawlers occasionally come from Zurich they are primary crawling from the main data center in Mountain View, California. Due to the crawler locations, no matter where they hit your site, the detection scripts would only route them to English or US centric content—again making content in other languages and countries invisible to the crawlers.

Testing the defaults and making exceptions

To truly understand what your servers are doing you need to test them so that you are confident that they are serving the right content for all situations, especially to the crawlers. Just asking your IT team what is happening is never enough proof of the right settings.

The most reliable test is to have co-workers or customers visit the sites from various locations with different language settings turned on and off to see what is happening for each situation. You should not only check the default settings of your server but also any subscription services your company may deploy such as Akamai’s Global Traffic Management IP Intelligence or cyScape’s CountryHawk IP detection solution. For both of these tools, as well as any other scripts found on the web, you need to ensure that you have loaded exceptions to the redirection rules to ignore the user agent names for the major search crawlers to allow them to access the page they requested.

Finally, you should develop local language/country XML site maps for each local version of the site and register them so that the crawlers have a direct way to access the pages and index your content.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEO | Multinational Search


About The Author: is currently the President of Back Azimuth Consulting and co-author of Search Engine Marketing Inc. His personal blog is

Connect with the author via: Email | Twitter


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • Andy Atkins-Krüger

    Here here Bill – very good post!

  • Receptional

    Hi Bill,

    The post illustrates the problem, but does not really help with an effective solution. Having an XML map for each local version will of course help to get the pages to index – but does little to help them rank. If we accept the premise that a page’s authority is based – in significant part – on the number and quality of pages linking into a given page, then the solution does not help Googlebot crawl link into the page in (say) French.

    Is one way to do this to ensure that when a user has arrived at (say) the french content, this content is firstly on its own URL (and ideally in its own directory which is easy to filter in Webmastertools) but that any redirection from this point is cookie based or manual “click” based. Since the bots do not generally accept cookies, they will happily crawl the content and the links, regardless of the location of the bot.

    So would you go for a “best practice” option to use IP targeting to set a country preference cookie – but not to physically redirect the user unless the cookie is present?

  • Bill Hunt

    Hi Dixon,

    Thanks for the great questions – let me try and clarify. This article was less on “how and why to do language detection” but more about if you are check your defaults and variations since they are typically blocking the spiders.

    Yes, your way of doing it is valid and you should have that detect script set that way if they took a cookie with country/language preference. However, it is the execution of the process that was my main point. In your example, yes, the spider would not accept the language/country preference query and would get the content they requested. If they wrote the script so that if no previous preference let visitor pass to requested page then we are golden. Unfortunately, in about 8/10 cases the coder requires the cookie – trapping the spider trying to get it to allow the cookie and/or since they have no preference try to set a preference – then if they can’t set a preference then route to default language – typically English. This results in the spider getting trapped or bypassing the other language content.

    You should “always” let the spider have any page it requests – this is not cloaking by simply saying “if user agent = some spider” then let them have the page they requested.

    The problem is a bit more complex since many are starting to use the various applets in IIS server and other CMS tools to dynamically set currencies. I am seeing this more and more with English and Spanish sites. They build once and then dynamically swap currencies, country phone codes and other country specific attributes based on the language or location preference. However, they are the advanced users and I am more concerned with just the simple language preference detection that is going on with many sites.

    No matter what option you do the point of my article was to tell you to follow the process through and test the default and results for the various detections. Another common problem is with Flash detection.

    I saw a site recently that had 5 traps that were catching the spiders 1) MSIE or FF browsers only 2) flash test – were they required flash (Google is getting better at passing this test), then 3) Flash version test – must have current version of flash 4) Javascript and 5) cookies – in each case they threw an error code blocking the spiders. Had they bounced any of these to a HTML rendition they would get indexed, scored and ranked rather than sending the spider away. Note – most of these also trip mobile browsers too.

    In a future article I will go deeper in to a flow chart for language and country detection options.


Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide