• http://searchengineland.com Andy Atkins-Krüger

    Here here Bill – very good post!

  • http://www.receptional.com Receptional

    Hi Bill,

    The post illustrates the problem, but does not really help with an effective solution. Having an XML map for each local version will of course help to get the pages to index – but does little to help them rank. If we accept the premise that a page’s authority is based – in significant part – on the number and quality of pages linking into a given page, then the solution does not help Googlebot crawl link into the page in (say) French.

    Is one way to do this to ensure that when a user has arrived at (say) the french content, this content is firstly on its own URL (and ideally in its own directory which is easy to filter in Webmastertools) but that any redirection from this point is cookie based or manual “click” based. Since the bots do not generally accept cookies, they will happily crawl the content and the links, regardless of the location of the bot.

    So would you go for a “best practice” option to use IP targeting to set a country preference cookie – but not to physically redirect the user unless the cookie is present?

  • http://searchengineland.com Bill Hunt

    Hi Dixon,

    Thanks for the great questions – let me try and clarify. This article was less on “how and why to do language detection” but more about if you are check your defaults and variations since they are typically blocking the spiders.

    Yes, your way of doing it is valid and you should have that detect script set that way if they took a cookie with country/language preference. However, it is the execution of the process that was my main point. In your example, yes, the spider would not accept the language/country preference query and would get the content they requested. If they wrote the script so that if no previous preference let visitor pass to requested page then we are golden. Unfortunately, in about 8/10 cases the coder requires the cookie – trapping the spider trying to get it to allow the cookie and/or since they have no preference try to set a preference – then if they can’t set a preference then route to default language – typically English. This results in the spider getting trapped or bypassing the other language content.

    You should “always” let the spider have any page it requests – this is not cloaking by simply saying “if user agent = some spider” then let them have the page they requested.

    The problem is a bit more complex since many are starting to use the various applets in IIS server and other CMS tools to dynamically set currencies. I am seeing this more and more with English and Spanish sites. They build once and then dynamically swap currencies, country phone codes and other country specific attributes based on the language or location preference. However, they are the advanced users and I am more concerned with just the simple language preference detection that is going on with many sites.

    No matter what option you do the point of my article was to tell you to follow the process through and test the default and results for the various detections. Another common problem is with Flash detection.

    I saw a site recently that had 5 traps that were catching the spiders 1) MSIE or FF browsers only 2) flash test – were they required flash (Google is getting better at passing this test), then 3) Flash version test – must have current version of flash 4) Javascript and 5) cookies – in each case they threw an error code blocking the spiders. Had they bounced any of these to a HTML rendition they would get indexed, scored and ranked rather than sending the spider away. Note – most of these also trip mobile browsers too.

    In a future article I will go deeper in to a flow chart for language and country detection options.