Google’s New Multilingual Markup Signals New Issues Of Concern For Global SEOs

Last Monday, Google announced that they had released “new markup for multilingual content”, see the webmaster tools blog post here. Even for those of us that work in the field of looking after global websites, this produced relatively unexciting headlines along the lines of “Google Launches New Multilingual Markup — Wow”. Big yawn. In fact, digging […]

Chat with SearchBot

Last Monday, Google announced that they had released “new markup for multilingual content”, see the webmaster tools blog post here. Even for those of us that work in the field of looking after global websites, this produced relatively unexciting headlines along the lines of “Google Launches New Multilingual Markup — Wow”. Big yawn.

In fact, digging deeper into the announcement produces new worries and potential new solutions for international SEOs.

For instance, whilst it may not have been Google’s intention, they’re presenting this as a “stronger signal than canonicals”, and give scenarios for its use which many did not even know existed as potential danger areas — including me.

Google Announces Multilingual Markup Scheme

Google Announces Multilingual Markup Scheme

 

The Two Big Elephants Of Global SEO

So, let’s try and walk through this in a logical way so we can all grasp what’s going on. Firstly, there are two big related issues which have plagued international SEO for years, namely:

  • Dealing With Duplication
  • Correctly Geo-Targeting A Site

In addition, Google has been under pressure for some time from large global corporates concerned that their global website costs were escalated by the needs of the Google algorithm — because of the impacts on translation costs.

So there are also two further translation issues:

  • Costs Associated With Creating Specific Country Translations
  • Dealing With Multi-Language User Generated Content

Dealing With Duplication

I’ll take each of these in turn to clarify what’s involved starting with “Duplication”. A problem with content that is duplicated arises because Google’s algorithm naturally throws out all copies apart from the “Best” or “Most Original”.

This is fine if you’re working on content for one country only, but if you need to show the same content for countries which all speak the same language then it starts to become challenging.

It’s important to note that, to date, this issue has applied only to copies of content that were in the same language. I’ll explain later why this distinction is important.

The main issue with duplication is actually that marketers need to show the correct country content to that country at the moment that user finds their site via Google.

Showing them the wrong country could provide them with incorrect contact details or pricing which potentially reduce their conversion performance or effectively deliver poor customer service.

Correctly Geo-Targeting A Site

Correctly geo-targeting a site means that when a user searches for an organization within a particular Google local domain, the site shows up and has not been filtered out because Google thinks it relates to a different country.

This is particularly important in the “Page From” and “Pages In {Language}” user filters on the left of the page. If user searches for you via “Pages From Norway” for instance, and you have a Norwegian operation, you definitely do want your site to show up in the rankings.

Costs Associated With Creating Specific Country Translations

Large corporations invest billions of dollars in translations and often not very productively. There are two solutions to this problem which can help with the cost namely using the same “World Language Content” multiple times or adopting machine translation techniques.

For example, it’s very common for global sites to use only translate their content into Spanish once and to deliver that same Spanish to all countries needing that language equally. As there are at least 20 countries speaking Spanish, the could mean 20 copies of the same content on the same site at the same time. Yep, we’re duplicating.

To avoid the duplication, we might use local domains to help (I can confirm that does help) or we might translate a fresh copy for each country — or more commonly, for the major ones which we consider significant.

Dealing With Multi-Language User Generated Content

If you’re site is a forum or Q&A style site that operates internationally, you face the problem that your users are generating content in one language that you might wish to share with users in other languages.

Almost all organizations facing this problem opt to use auto-generated translation techniques; but unfortunately this has begun to fall foul of the Panda algorithm which searches out poor quality content based on machine learning and pattern techniques. Frequently, auto-translated content looks like really bad spam!

The Solution Google Is Offering

With the new markup, Google is putting forward a different way of solving these problems. At a later point, I will look at the broader issues of combining this approach with Webmaster Central geo-targeting and the use of local domains, but for now we’ll stick to the markup option.

Back in February 2009, Google first launched the Canonical markup tag supported also by both Bing and Yahoo. The main purpose of the canonical tag was to add code on the page to indicate to the search engine the “Canonical” of the page. In other words, to indicate to the search engine which was the most important copy of the page which should be shown to users.

Later, the canonical tag was given cross-domain capabilities which extended its reach into the multilingual world — but there it suddenly ran into some significant limitations.

If you used the rel=canonical tag to solve duplication issues, you had to choose which one was the “Top” URL which meant you could be showing UK content in Australia or Argentinian pages to the Spanish.

Targeting The Right Part Of Google Involves Understanding Google Geographic Filters

Targeting The Right Part Of Google Involves Understanding Google Geographic Filters

Where The New Tag Comes In

Let’s imagine that in our scenario where you’ve denominated, using rel=canonical, that a certain page is the “master content”.

What you’ve now done is “Deduplicated” it! In other words, you’ve given an indication to Google that a particular piece of content is necessarily duplicated and you’ve indicated which is the original.

What you’ve not done is indicated which version of that now-known-to-be-duplicate content should be shown where.

The rel=”alternate” hreflang=”en” version of the Hreflang tag enables you to say, “This is for Australia, this is for the UK!” We can also assume that without “Hreflang”, the top content — perhaps the UK version — would be linked to by both sets of Google results.

So to recap, rel=canonical deduplicates and rel=alternate hreflang denotes the geo-targeting. Simple.

Not Quite So Simple

Don’t forget, we also have Webmaster Central geo-targeted settings in the background and they’re useful because you can geo-targeting a whole site, folder or sub-domain to a particular country.

With the “rel=tag thingies”, you have to specify the settings on a per URL basis which involves considerably more effort and cost than the use of global settings at Webmaster Central.

However, the rel=”alternate” hreflang has the advantage that it can be deployed alongsider ccTLDs or local domains. This makes eminent sense and prevents people thinking they have to use a dot com to target when in fact ccTLDs achieve better results.

Where Site Content Is “Fully Translated?”

In the rel=”alternate” annoucement and Webmaster help pages, Google gives as an example scenario the denomination of targeting for a German and for an English URL — the question is why? A German translation of an English text is by definition not a duplicate, unless you re-translate it back and re-compare. So why would Google show this example?

The use case Google describes is, “Multiregional websites using fully translated content, or substantially different monolingual content targeting different regions. Example: a product webpage in German, English and French”. So it is clear that Google would like us to use this tag to denote content even if it’s not in the same language.

Other commentators have concluded that Google is telling us that translated content “Can Be Duplicate”. I doubt this because I cannot see what the purpose might be from Google’s point of view.

Studying The Non-Duplicate Use Cases

There are other cases where indicating which language a page’s content is deemed to be in could be useful:

  • To indicate dynamic machine translation
  • To identify content in English to be translated

Machine translation is known to raise red flags to visiting Panda crawlers because of their preference for correct gramatical and properly flowing natural language — in the same way that real life Panda’s are very fussy about eating natural bamboo shoots and nothing else!

Denominating your content as machine translated and linking it to one original source could be used in Panda as a means of giving additional authority to the content even if doesn’t really flow very naturally at all and would normally be discarded. (Less politely said, “Your content is rubbish dude but we’ll let it pass…”)

Equally, some websites hold content in English and translate the content dynamically — such as multi-country forums for example. That content may be seen by the crawler as another copy of English to be discarded, unless it is denoted as “German” which is the language it would be displayed in once a user had “demanded” the content to load.

Using the alt=”alternate” hreflang could help Google understand this process algorithmically.

Canonical Involves Guesswork — Not Any More

The result? I’m already thinking that this opens up so many options we’d better be increasing that hour’s training we’re currently providing to explain geo-targeting, to a full-day as the potential variations have now expanded exponentially! By the way, if anyone has any updates do let me know via the comments!


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Andy Atkins-Krüger
Contributor
Andy Atkins-Krüger founded Webcertain – the multi-language international search marketing services business that runs the International Search Summit alongside SMX in Europe – and also includes the in-house business which specializes in supporting internal agencies within big groups with the specialist language needs.

Get the must-read newsletter for search marketers.