• Content count

  • Joined

  • Last visited

Community Reputation

182 Neutral

About Yomar

  • Rank

Personal Information

  1. Hello all!   It's been some time, but I'm back with a new article, this time about why crowdsourcing the translation and localization of your game is a really bad idea.   The idea of crowdsourcing is indeed a very attractive one. Localization can be expensive if carried out the wrong way, and by gathering an enthusiastic crowd that would love to translate your game for free, you can save quite some money. To make crowdsourcing even easier, several crowdsourcing platforms have been launched: you register, you upload your app and off you go.   Now, there are several reasons why crowdsourcing can fail before you have even started...   You can find the whole article here: http://www.loekalization.com/crowdsourcing.html   Suggestions and feedback are always welcome, and I'm always up for a good discussion.   Bring it on!
  2. This can indeed be a problem, but also varies very much case by case. There are a few important things: 1. Due to difference in word order, there should always be space before and after the variable if the sentence starts or end with it. %s says: Hi There may be languages that put something before %s, so you should never offer just "says: Hi" for translation. %s says: Hi %s There may be languages that put something after the last %s, so you should never offer just "says: Hi" for translation. 2. Gender issues can often be solved by including the article "the/a" and possible adjectives in the item name itself. This way... You have found %s the red rod a blue wand will automatically be rendered correctly, no matter which language they are translated to. Please note that in this case, there should also be a post-string for languages that put something behind %s. Strings like: You have found the blue %s ...are just asking for trouble. If you want to be 100% safe (I can imagine there are languages in which the gender of the item even influences the verb clause "have found"), concatenation should be totally avoided, but I understand this is not always feasible. Anyway, be prepared and make sure your code is flexible enough to change these things if a translator tells you that the format of the strings is causing problems in his/her language. Last but not least, if you use concatenation, make very clear which strings belong to which strings. Strings as isolated sentences are often translated differently than strings in concatenated sentences. Your statement about Japanese is correct: Japan has a very hierarchical society and the status and relationship between the speaker and the listener influence the whole language. Therefore, it should always be made clear who says what to whom. For a programmer, especially a monolingual programmer, it is very hard to predict which kind of strings will cause issues and which not. Therefore, often these kind of things are handled on the fly. These 3 tips should really help you: 1. Limit concatenation as much as possible. 2. If you use concatenation, make very clear how the strings are built up and be prepared to make a few changes where needed. 3. Provide context (often context is provided automatically by sorting the strings in a logical order, but I've seen several resources that sorted dialogue strings alphabetically, just for coolness, which is really not a good idea). [Edited by - Yomar on May 18, 2009 10:06:48 AM]
  3. Sure! Like: <name lang="en">Magic cane</name> <desc lang="en">This will give you 5 extra magic.</desc> <meta lang="en">This is not the cane that old people use to walk, but a cane from a plant.</meta> It is to provide extra (and much-needed) context for shorter strings. One of the questions you will get most by the way is "Noun or verb?" I.e. whether the word Copy is a noun (the copy) or a verb (to copy) makes a huge difference in many languages: <name lang="en">Copy</name> <meta lang="en">Noun!</meta> Also, I can recommend to hand each translator a full copy of the game with cheat codes. The more context you give them, the better the translation will be. Most game translators will play the game for free if it gives them more context.
  4. That's definitely a good way! I'd implement an extra meta tag for clarification about certain strings, so that you'll never need to answer the same question twice.
  5. Dear JDXSolutions Ltd, That's exactly what I did. During the first phase of the project, it was on top of my list. When I found out they were bringing in other translators, it got a much lower priority, so that they had to bring in even more translators. During this phase, I worked on many other projects. During the third phase, when they promised to once more let me become the sole translator, I put the project on top of my list again - not because I wanted to list it on my resume (I still doubt whether that will be possible), but to create goodwill for possible future projects.
  6. Yes, Java Properties are a pretty common format in the localization business. Whatever tool you choose, make sure it is compatible with the main tools on the market, as most translators will stick to their own tool only, to be able to get maximum leverage from their existing translation memories (previous translations done for other clients). That said, since Java Properties are so common, you probably won't need CAT tools yourself, unless you are planning to do part of the translation or want full control over the process and keep your translators in check. The latter however can also be done by asking your translators to send you their newest translation memories in a format supported by your tool whenever they deliver a new batch. As during the translation, translators sometimes change their mind about previous translations (more context that puts previous translations in a different light), it's best to ask them to resend the entire memory instead of just incremental updates. You can then use that memory to "pretranslate" (as it's called) your next batch and compare the number of matches generated with the number of matches reported by the translator's tool. Every tool counts differently, so there will be small discrepancies, but as long as they don't become too big (with 20% being the absolute limit)*, you will know that your translator is an honest man. That said, many translators will be willing to follow your tool's word counts anyway (even if they use another tool), as in practice, the differences are not that big. Constantly checking word counts is a real hassle for everyone involved. Make sure you give your translator the opportunity to recheck exact matches, as sometimes translations change depending on context (and sometimes translators simply get brilliant ideas and want to improve existing translations). Currently the most popular formats for exchanging memories between different CAT tools are TMX and Trados TXT. Things are changing fast though: pay attention to XLIFF, that is becoming more and more popular. === *If the discrepancy becomes bigger, there's still a chance that your tool uses different segmentation rules. Some tools consider... Save: Saves the file. ...as one string, while others consider it as two strings: Save: Saves the file. Advanced tools have segmentation rules that can be set by the user. The closer your tool's and the translator tool's segmentation rules, the less discrepancies you should get.
  7. Thank you lightbringer! Actually, there is plenty of information about CAT tools available on the internet. You can even download a fully functional 30-day demo version of my favourite CAT tool from http://www.atril.com/ The Déjà Vu X Workgroup Getting Started Guide under the Documentation contains a fully-fledged tutorial to get you started. You can import dozens of files in dozens of formats, translate them and export the result. Also... http://en.wikipedia.org/wiki/Computer-assisted_translation and... http://en.wikipedia.org/wiki/Translation_memory contain quite a bit information about CAT tools in general. As said, these tools remember every single sentence ever translated, including information about who translated the sentence in question and when it was translated. You can store this information in databases that can be linked to different projects. You can also stack multiple databases and prioritize them if you want the tool to look in certain databases first before searching other databases. Last but not least, you can set the fuzziness of matches generated by the tool: to get matches that are more or less similar to the string you are currently translating. Now if you have a certain file translated to say French using CAT tools, you can easily update it to a second version, which you can feed to your translators as soon as they are done with version 1. The CAT software will automatically match all strings that didn't change, and generate fuzzy matches (if possible) for strings that did. Mostly you'll get a 75% discount on strings that didn't change. In some cases you may even be able to negotiate a 100% discount. This way there is no further need to keep track of which strings were changed when, as the CAT tool will automatically detect this. Advanced CAT tools even have functionality for detecting context, to avoid false positives. For example, the word "space" in this context: Earth Venus Sun Space ...will be translated very differently than the "space" in: Escape Backspace Delete Space Advanced CAT tools can detect this and distinguish between exact matches (the string is the same) and guaranteed matches (the string *and* the x strings around it are the same), whereby x is a parameter which you can set in your project. Often there is no direct need for developers to invest in software like this, which can be quite pricy: every serious translator has a CAT tool these days. If you consult the translator in question, he or she can tell you how your text can be made more CAT-friendly if needed, though if you use common formats like XML, Word or Excel and don't unnecessarily split strings right in the middle (so-called concatenation), you should be safe. That, and keep text separated from code as much as possible. If you have any specific questions about CAT tools, please do not hesitate to ask! [Edited by - Yomar on May 6, 2009 8:41:32 PM]
  8. Ah, maybe I should add that the site's title (translations from Japanese and English to Dutch) is a bit misleading when it comes to that. The story and many other tips on the site apply to localization/translation software and games in general, and definitely not only localization to Dutch. It's like certain programming paradigms: they don't only apply to C+, but also to PHP and Python etc. Thank you for noting that.
  9. Thank you sybixsus! I was becoming a bit insecure. I'll try and be more patient :)
  10. Mmmh... the silence is deafening. My last article generated a lot more response. Are developers no longer interested in the do's and don'ts of localization, or is the article so boring that you gave up after two lines already? Please teach me. I want to learn.
  11. Hi all! I'm back with another lengthy write-up about game localization, as seen from the translator's perspective. Any suggestions, comments and remarks are welcome! You can find the article right here: http://www.loekalization.com/projectfromhell.html It's long and complicated, but hopefully offers a few interesting insights in how you should not tackle the translation of your games.
  12. Anon Mike: I liked how you phrased the best solution for space constraints (not hardcoding the UI), so have adopted that :) Anonymous Poster: Fantastic! Babelfish had not even crossed my mind yet, but yes, there have been game developers who used it. The results were of course disastrous. Your example is totally hilarious and I have incorporated it (using a different text). CAT (Computer-Assisted Translations) tools work on a sentence level. They do nothing but storing every pair/sentence you have ever translated. Even then, if the software encounters the same sentence twice, the context will still need to be checked by a human translator (once more think of the heading called "space", which sometimes refers to the space bar and sometimes to the universe, resulting in different translations). MT (Machine Translation) tools go one step further and try to work on word level. They translate by applying grammar rules to dictionaries. This technology is still in its infancy and I sincerely doubt we will ever see it work. The main problem is once more, context. If the word "get" can be translated in 10 different ways depending on the context (get yourself a dictionary and note that this is not an exception), the software needs to actually understand the text to know which translation fits best. Now realize that the software encounters this problem for virtually every word in a sentence, and the possibilities are endless. To "understand" a text, you need knowledge. Knowledge about the world and how things work. Even Google does not contain all knowledge of the world. Even humans don't, and that's exactly why translators should stick to the fields they are specialized in. Give me a text about games and the translation will rock; give me a text about biotechnology and the result will be the disastrous. "I'm glad we made it" can mean different things. If it's said in this context: Wow, that bomb worked really well. I'm glad we made it. ...made should be taken literally. However, in this context: I almost thought they would catch us. What a chase! I'm glad we made it. ...made means something entirely different. Try to teach a computer that difference for every English word, and you will understand that it's going to take a really long time before MT will give satisfactory results. And I haven't even touched slang and idioms yet :D You're flossin' your fly threads, getting ready to hit up the bash, when it hits you-- you have no idea what any of these words mean. Before you bump your gums, take our quiz to check your knowledge of the latest slang. Ya feel us? MT has only been succesful in very well defined, narrow fields, and even then it only works if the authoring process takes into account that the text will be processed by an MT tool later on. Think user manuals written in a very robotic way with lots of short, simple sentences where a spoon is just that: a spoon. For games, MT is definitely not apt. And in the game industry, we are not going to see the day on which MT will actually work.
  13. Anonymous Poster: the biggset mistake is thinking that you can worry about localization later I needed 10 pages, and you just managed to summarize it in 12 words. Yes, that is what everything boils down to indeed. Well summarized indeed! :) Leoptimus: The official difference between localization and translation is indeed that: not only translating what the text says, but also adapting the text to the local market. The problem these days is that many agencies and freelancers think "localization" sounds cooler than "translation" and just use the word for everything that is related to translation. You're right when you say that "localization" should be carried out in the purest sense of the word.
  14. Hi Ezbez, Well, 3 days can be enough for a manual. It merely depends on how long the manual is. The average manual has about 6.000 words. This is a manual that can definitely be translated within 3 days (assume 2.000 words per day for an ideal delivery date, and add a few working days just in case). The number of words in a UI varies greatly. Shooters might only have 200, story-driven shooters something like 12.000, simple point-and-click adventures something like 20.000 and RPG's/MMORPG's something like 500.000 or even more. Like I said, the average manual contains only 6.000 words or so. Manuals for complicated games like Sim City however, can easily contain 50.000 words though. So it's important to regularly count the number of words in your UI/manual to keep an overview. That way you can kick off the translation when the time is there. And note that the manual/UI do not need to be finished yet. Good agencies/translators use software that can spot all changes in future updates (without charging you twice). Assume 2.000 words per working day + 3 working days for scheduling. This is rather conservative, but you'd better stay on the safe side. If you know in advance that you don't have that much time available, contact the agency/freelancer beforehand to see whether it's possible to reserve more words per day. Make very clear that you want as little translators in the team as possible. The more you use, the more inconsistent the result. Using one translator doing 6.000 words a day (more is humanly not possible, at least not for longer periods at a stretch) is better than using three translators doing 2.000 words a day (because they have plenty of other projects on their plate already). You want to make the team as small as possible, while raising the capacity as much as possible. In this case, the agreement is merely tentative. The translator knows that something is coming up, but not exactly when. S/he can take this into account when planning other projects. If you want a 100% guarantee, then you can actually reserve the translator. In this case, you guarantee that the translation will start on this and this day. However, be careful with this: people will hold off other projects for this and suffer damages if the source text doesn't arrive on time. Personally I tell clients in advance that if they actually want to reserve me, I will charge even if the text does not arrive. That's a total waste of money, so if you use this approach, you need to be really sure when the text is ready. Larger agencies using a pool of translators can easily cope with this and plan translations on the fly, but the result is that they will pick translators who are available at that moment (not necessarily the best translators in their team). I.e. there's a price to pay: quality. The best method is the first method: count your words regularly and kick off the translation in time. The second best option is kicking off the translation even if the text is not finished yet. Updates can always be translated on the fly.
  15. True Yapposai! It isn't really an issue for localizations to Dutch, but if you're localizing your game to Japanese or Chinese, this can have tremendous consequences. We're writing the year 2006 and I think that by now, ASCII should be forbidden and totally replaced by Unicode, even if your game is only in English. You never know what you will do in the future, and Unicode offers support for all languages. Of course, you will still need to integrate different fontsets for (what we consider as) exotic languages, but that's a lot easier than rewriting your code from scratch. I'm going to add this one, even though it doesn't really apply to my language combinations.