How Machine Translation Has A Habit Of Mangling Multilingual SEO
Recent discussions force me to return to the subject of translation versus SEO — particularly machine translation — as it seems this old topic has not yet gone away. For multinational sites, maintaining your site can be an expensive affair, and the cost savings of machine translation seem outstandingly attractive.
But as my mother always says to me, “If something looks too good to be true, it usually is.” Machine translation is a good case in point.
Often abbreviated to “MT”, machine translation involves using computers to do the work which human translators would normally do. It is not entirely the same as “computer aided translation”, where similar technologies are used to assist humans, although there is a significant overlap between the two.
Bring Back Humans, All Is Forgiven
Usually, human intervention is needed of a “machine translated” text — equally in cases of computer translations, strings of texts, suggestions and dictionaries are all on hand, automated to a greater or lesser degree.
Monoglots, that is those people who speak one language only, often don’t appreciate that translation involves changing a lot of things other than simply the “words”. They also expect it to be possible to click a button and have “English word” magically converted to “French word”.
There is not always a one-to-one relationship of words. In Swahili, for instance, there are said to be 21 different ways to express “to walk” that in English would be rendered with just the one!
German Speakers Leave The Verb To The Very End
The word order, the length of sentences, and the tenses can all be handled differently and whereas the verb in English typically appears early in the sentence. In Germanic languages, it will often wander off to its end which might be the very last item in a very long statement.
All of this means that using computers for translation is challenging — but organizations such as Google have been demonstrating the power of using the “corpus” of the collected Web combined with statistical analysis to generate a newer form of machine translation — a form which has advantages over the older string replacement approach.
However, for SEO, machine translation remains a demon and here’s why. Keywords are very special creations of the human mind — I once nicknamed them “abbreviated thoughts” and have found myself using that description many times over the years as the easiest way to explain their different nature.
Multiple Ways Of Expressing The Same Idea
There are two problems for keywords (and therefore SEO) caused by machine translation. The first is that there may be multiple correct ways of expressing an idea and the “machine” opts for the version which is less popular or which may not even used as a keyword at all.
The second is that for reasons which I will explain, the corpus doesn’t contain the keyword and therefore the machine translation doesn’t even hold it in its memory as a potential option.
The background is that the published Web content which is inside Google’s index and then used by its translation tools, contains equivalent expressions of other languages which are used as keywords in one but not in the other. Again, the translation is correct — but the keyword mapping is not.
“Car insurance” when expressed in French on websites often uses the term “Assurance automobile”. There is also more of a spread in French between use of the word “voiture” and “automobile”. It is rare for English websites to talk about “automotive insurance”, for instance — but the same is not true in French.
A Different Approach Needed In Different Languages
Additionally, as you can see from the Google keyword research below, there are over half a million searches using the term “auto assurance” where “automobile” is abbreviated by the searchers to “auto”.
Adding on the “voiture” phrases, you would need a very different approach from an SEO perspective in French than you would in English.
Another risk of automated translation, which Matt Cutts has alluded to in this video, machine learning generated translation may actually trigger the Panda filters which are looking for automated spammy content on the Web. He didn’t say whether this included the use of Google Translate!
My advice is to:
- Avoid using machine translation for marketing content.
- Translate only so much content as will produce a good ROI — and don’t translate simply for the sake of it.
- Ensure that some human SEO intervention takes place during the SEO-Translation process.
- Don’t use translators for SEO, use them for what they’re expert at which is translation.
Everything you need to know about SEO, delivered every Thursday.