The Project From Hell: How you should not organize your localization

Started by
19 comments, last by Yomar 14 years, 11 months ago
Hi all! I'm back with another lengthy write-up about game localization, as seen from the translator's perspective. Any suggestions, comments and remarks are welcome! You can find the article right here: http://www.loekalization.com/projectfromhell.html It's long and complicated, but hopefully offers a few interesting insights in how you should not tackle the translation of your games.

Don't localize, Loekalize!

Advertisement
Mmmh... the silence is deafening. My last article generated a lot more response. Are developers no longer interested in the do's and don'ts of localization, or is the article so boring that you gave up after two lines already?

Please teach me. I want to learn.

Don't localize, Loekalize!

On the contrary, I'm very interested in the subject and I'm very keen to read it. I just haven't had time to read it so far because I'm too busy working on the game right now. It's bookmarked and on top of my to do list for the weekend though. Your time writing it is very much appreciated.
Thank you sybixsus! I was becoming a bit insecure. I'll try and be more patient :)

Don't localize, Loekalize!

I think you swill find that the lack of interest is due to the fact that the article targets an area of the business that Gamedev's readership have no contact with. Most members here are indie developers, few of whom do localised versions of their games and where they do it is mainly key European languages.
Dan Marchant - Business Development Consultant
www.obscure.co.uk
Ah, maybe I should add that the site's title (translations from Japanese and English to Dutch) is a bit misleading when it comes to that. The story and many other tips on the site apply to localization/translation software and games in general, and definitely not only localization to Dutch.

It's like certain programming paradigms: they don't only apply to C+, but also to PHP and Python etc.

Thank you for noting that.

Don't localize, Loekalize!

Interesting rant/article. Maybe a bit too heavy on the rant side, although I can understand your frustration. Knowing a little bit about Japanese, I was always interested in the potential issues when localizing to/from CJK text, so reading about those was enlightening (although I admit I am more interested in technical implementation from the game's perspective). I would've liked to read more about the whole computer-assisted translation software, since that seems very useful for consistency - very often in games and other media I see subtitles that are inconsistent in their terminology. And sometimes the spoken text does not match the written text at all, like in the Russian version of STALKER, where the Rusian voice track is completely different from what's written in Russian on the screen - like they're working based on two different scripts that were translated independently from each other.

You really translate a Japanese game to Dutch based on the English translation? I'm surprised that this is standard, as so many original nuances are lost whenever you work based on a translation of an original text. I'm not exactly expecting Shakespeare level from game text, but it's still somewhat disappointing.
Thank you lightbringer! Actually, there is plenty of information about CAT tools available on the internet. You can even download a fully functional 30-day demo version of my favourite CAT tool from http://www.atril.com/

The Déjà Vu X Workgroup Getting Started Guide under the Documentation contains a fully-fledged tutorial to get you started. You can import dozens of files in dozens of formats, translate them and export the result.

Also...
http://en.wikipedia.org/wiki/Computer-assisted_translation
and...
http://en.wikipedia.org/wiki/Translation_memory
contain quite a bit information about CAT tools in general.

As said, these tools remember every single sentence ever translated, including information about who translated the sentence in question and when it was translated. You can store this information in databases that can be linked to different projects. You can also stack multiple databases and prioritize them if you want the tool to look in certain databases first before searching other databases. Last but not least, you can set the fuzziness of matches generated by the tool: to get matches that are more or less similar to the string you are currently translating.

Now if you have a certain file translated to say French using CAT tools, you can easily update it to a second version, which you can feed to your translators as soon as they are done with version 1. The CAT software will automatically match all strings that didn't change, and generate fuzzy matches (if possible) for strings that did. Mostly you'll get a 75% discount on strings that didn't change. In some cases you may even be able to negotiate a 100% discount. This way there is no further need to keep track of which strings were changed when, as the CAT tool will automatically detect this.

Advanced CAT tools even have functionality for detecting context, to avoid false positives. For example, the word "space" in this context:

Earth
Venus
Sun
Space

...will be translated very differently than the "space" in:

Escape
Backspace
Delete
Space

Advanced CAT tools can detect this and distinguish between exact matches (the string is the same) and guaranteed matches (the string *and* the x strings around it are the same), whereby x is a parameter which you can set in your project.

Often there is no direct need for developers to invest in software like this, which can be quite pricy: every serious translator has a CAT tool these days. If you consult the translator in question, he or she can tell you how your text can be made more CAT-friendly if needed, though if you use common formats like XML, Word or Excel and don't unnecessarily split strings right in the middle (so-called concatenation), you should be safe. That, and keep text separated from code as much as possible.

If you have any specific questions about CAT tools, please do not hesitate to ask!

[Edited by - Yomar on May 6, 2009 8:41:32 PM]

Don't localize, Loekalize!

Thanks for all the info! I see a bunch of open-source cross-platform CAT tools on one of those lists. With support for Java properties/bundles, no less. I've been writing property files for multiple languages for most of my projects by hand so far, so this will be quite useful. Kinda hard to imagine that it never occurred to me that there are tools out there to help with managing these and keeping them consistent. (Although I imagine the FLOSS tools will be quite a bit more limited - I'll have to look into it once it's no longer the middle of the night).
Yes, Java Properties are a pretty common format in the localization business. Whatever tool you choose, make sure it is compatible with the main tools on the market, as most translators will stick to their own tool only, to be able to get maximum leverage from their existing translation memories (previous translations done for other clients).

That said, since Java Properties are so common, you probably won't need CAT tools yourself, unless you are planning to do part of the translation or want full control over the process and keep your translators in check. The latter however can also be done by asking your translators to send you their newest translation memories in a format supported by your tool whenever they deliver a new batch. As during the translation, translators sometimes change their mind about previous translations (more context that puts previous translations in a different light), it's best to ask them to resend the entire memory instead of just incremental updates.

You can then use that memory to "pretranslate" (as it's called) your next batch and compare the number of matches generated with the number of matches reported by the translator's tool. Every tool counts differently, so there will be small discrepancies, but as long as they don't become too big (with 20% being the absolute limit)*, you will know that your translator is an honest man. That said, many translators will be willing to follow your tool's word counts anyway (even if they use another tool), as in practice, the differences are not that big. Constantly checking word counts is a real hassle for everyone involved.

Make sure you give your translator the opportunity to recheck exact matches, as sometimes translations change depending on context (and sometimes translators simply get brilliant ideas and want to improve existing translations).

Currently the most popular formats for exchanging memories between different CAT tools are TMX and Trados TXT. Things are changing fast though: pay attention to XLIFF, that is becoming more and more popular.

===

*If the discrepancy becomes bigger, there's still a chance that your tool uses different segmentation rules. Some tools consider...

Save: Saves the file.

...as one string, while others consider it as two strings:

Save:
Saves the file.

Advanced tools have segmentation rules that can be set by the user. The closer your tool's and the translator tool's segmentation rules, the less discrepancies you should get.

Don't localize, Loekalize!

This topic is closed to new replies.

Advertisement