Localization and text support

Started by
7 comments, last by xelanoimis 14 years, 3 months ago

Hi, I'm looking for advice on localizing in more languages and text support in a game. The main question is if I can use a font class that keeps all the characters in a texture. In fact I already have this displaying ascii texts (up to 256 characters). I think languages like english, german, italian, etc. should work with the ascii set of 256 characters. Right? But what about japanese. Obviously there are more glyphs in it. But how many are relevant for a game (one with lots of dialogs)? How many glyphs the fonts from other games have? If I am to use more than 256 characters I should look into unicode texts, but obviously I need my font to support whatever is written in there. The other idea is to use a font engine, like FreeType, or SDL, or just the TrueType fonts from windows gdi (though that's not portable). If so, what is the common practice with these libraries. I'm afraid that rendering glyph by glyph for a page of text each frame may have a negative impact on the frame rate. Should I optimize it by rendering the whole page (or text items) in textures and use them as sprites? How do you support this in your projects? Thanks!

Advertisement
Quote:Original post by xelanoimis

I think languages like english, german, italian, etc. should work with the ascii set of 256 characters. Right?

Nope.

Quote:But what about japanese. Obviously there are more glyphs in it. But how many are relevant for a game (one with lots of dialogs)? How many glyphs the fonts from other games have?
You need all of the glyphs as they apply to alphabet (or its equivalent).

It simply isn't worth optimizing in this respect, and isn't possible if user can enter custom input. Imagine English alphabet missing letter x.

Quote:If I am to use more than 256 characters I should look into unicode texts

Yes, or UTF-8.

Are you aware that some languages do not read left-to-right, but top-to-bottom and right-to-left? And that UI needs to be designed differently

Quote:How do you support this in your projects?


Outsourced localization service, located in target country. In most cases it simply isn't worth the effort though. Unless you have a specific deal with effectively guaranteed target market that will cover the localization and QA costs, it simply isn't viable.
Localization is a huge bag of beans to deal with. I worked on a project that did what you are doing where all characters were stored in textures. We used unicode text and we had to run all localized text files through a generator that ensued all characters were present and accounted for. Depending on the language/font we loaded up different texture sets and accompanying binary data. We had no text chatting though, only voice chat.

As hinted this is just the beggining though. Some languages such as german tend to have way longer words so after localization is done we had to go through all of the windows where chat could appear and display the longest available text to make sure it fit properly. In the end we either reduced fonts, adjusted window sizes, or asked for alternate translations if possible.

Right to left or top to bottom languages may require completely different ui layouts all together.

Some languages use different grammar rules so cases where you have "Sally went to the market" may appear such as "The market Sally went to." In the case of the whole string being localized this is fine and covered by the localizer, but in the case where there is any string parsing involved it doesn't work as well without specific support.

PARSE_CODE("Sally went to %s where she bought %s", <location>, <food item>)

The localizers cannot do "Sally bought %s at the %s" if they need to as it will result in "Sally bought the market at the banana". Parsing in this case needs to be more like the C# style "Sally went to {0} where she bought {1}" so that it can be localized into "Sally bought {1} at the {0}"

If people can type in other languages in the game you cannot do the prebaked textures at all as you have no idea what characters will appear. You also may need to support an Input Method Editor(IME) in the chat boxes/text boxes/anywhere you type text. An IME allows for methods of typing where a user can enter multiple characters that will end up condensing into a single character.

Fun all around :)
// Full Sail graduate with a passion for games// This post in no way indicates my being awake when writing it
Thanks for the reply!

The game will only display messages in dialog boxes or maybe as subtitles. Each text can be fully localized through a text table. And players don't type custom texts. So I don't see much problem with this, except that all characters from the texts must be found in the font of that language (ideally on a single texture).

TO dclyde:
How many languages did you support in the app you worked on? And did you have to use more than one texture per font? What was the "largest" language and how many characters did it have in the texture(s) it used?

TO Antheus:
Does a language like german or french use more than 256 glyphs?

As for japanese, I'm sure not all glyphs are required to cover common stories. I suppose they have a limited set, at least when using computer texts. But does anyone know how many?


Quote:Original post by xelanoimis
TO dclyde:
How many languages did you support in the app you worked on? And did you have to use more than one texture per font? What was the "largest" language and how many characters did it have in the texture(s) it used?


Number of languages: I don't remember... I think we supported almost all of the languages you can set a 360 to. Might have been short a language or two.

We didnt seperate the textures per language always using unicode text we just supported the entire range of values (although condensed to the actual ones used). We just added another texture whenever we ran out of room on a texture for another character (limiting texture sizes to some predetermined size). But glyph based languages tend to use clumped ranges of values so you would only load relevant textures (although you may have had a couple un-necessary characters loaded due to them being neighbors.)

In the end it came down to if our usage of textures came out to be smaller than always using an entire ttf file to generate what we used at runtime. At a certain point the textures will be larger than the ttf file.

Couldn't tell you what the largest language was, I do not remember. Let's assume Chinese or Japanese, but even with those two you have options in what you support. Traditional vs non traditional and I think Japanese has a couple sets of glyphs.
// Full Sail graduate with a passion for games// This post in no way indicates my being awake when writing it
Quote:Original post by xelanoimis
...I'm looking for advice on localizing...


In my last C++ project, i used a combination of FreeType 2 for unicode font support (utf-8 is the way to go), an own self written variant of GNU gettext for localization and a tricky management class handling the glyph stuff in the background.

In short (much like dclyde's way), the management class has an internal texture of free space to work with. Everytime a new glyph is needed and not already present on that internal texture, it gets drawn to it by FreeType 2 functions. The management class remembers the last time each glyph was used (think of a weighted LIFO queue), and if the texture gets out of space for new glyphs, than the ones, which weren't used for a long time became overdrawn by the new ones. By the way, the management class could handle texture space dynamicly, e.g. using a second texture if enough graphics memory is available.
Now everytime we want to draw a text, we iterate through each character and ask our management class for the appropriate texture part, where that glyph is in. If we do some tricky handling of character requests (e.g. don't open ttf-file for each new character), than this will become really fast, meaning O(1), so don't hesitate to do this each render call (think about optimizations/caching of requests after this becomes the bottleneck). We could even cache a whole string on a new texture, because each string (consisting of many characters) doesn't change over consecutive render calls. Only when the text gets changed or resized (doesn't happen a lot of times in one second) we have to ask our management class again. (But create for this whole string caching another class, don't clutter your glyph management class. [grin])

This way it is even possible to display japanese, arabic and english words in one sentence at the same time, because we hold only the relevant characters in graphic memory. If your texture is to small (to many different characters each render call), than there are to much requests to the FreeType functions. Count them each second and this way you know when to allocate more texture space for the glyph management if you assigned to little in the first place.

For the LTR and RTL (left to right and right to left) kind of drawing strings, this is done by another class, which handles the render calls with infos given by the glyph management class (stepping left or right after each drawn character, or drawing glyph #2 above glyph #1 without stepping; in many languages some characters if following each other merging to a new third glyph).

Use a gettext variant for the localization part (no recompile necessary) and each of your strings in your source files gets wrapped by _(), e.g. "hello world" becomes _("hello world").

Conclusion:
Don't create a texture with all possible characters at the start of the program. Create needed glyphs dynamicly if you want to support many different languages at once.

P.S.: Different sizes of equal characters are also possible with the way described above. Think about a text size parameter, the users of your program can set if they want. (Don't do this by stretching your texture, stretched glyphs won't look as good as the ones drawn by FreeTypes font renderer.)
You should hide the implementation details by your glyph management class. Just offer a further parameter to your 'GlyphManager->getChar(utf8_char, size)' function.
Hi,

Quote:But what about japanese. Obviously there are more glyphs in it. But how many are relevant for a game (one with lots of dialogs)? How many glyphs the fonts from other games have?


I think, at least the whole set of Hiragana and Katakana is required, and the subset of Kanji (I believe it is ~4k character for Kanji alone, but not every character is used frequently.) is need. That overkills 256, because Hiragana and Katakana alone takes already ~180 glyphs (no Kanji is included at this point), based on Unicode standard. You could modify it a bit, because this number also includes glyphs for a character with modifier symbol (ten-ten and maru). But it eventually causes some difficulties. So I don't recommend separating the glyph for character and modifiers.

Anyway, like stated above, you should outsource the Localization to the native speakers. They know what is required for that language. For example, I'm Thai and we Thai don't like tone mark (We call it Wannayook) floating too high above character, which is presents when there is no above-character vowel. Though it is not wrong, it looks bad for us. This is something that most foreigners don't know when they work with Thai Localization/Internationalization.

You could also rely on OS to work with Text display (eg. generating Text texture with GDI/DirectWrite). But again, this would be OS-dependent, and may give inconsistency results between OSs. For example, the Thai text problem stated above does not present in Windows XP, but it does on Windows Vista/7 (the people working on text rendering does not know this, I think. The people behind the OpenType standard also not understand.). Also this will be performance bottleneck as well.

In short, you should use at least 16-bit character code, instead of 8-bit, if you plan for the Localization. I'm not saying that you need to go with UTF-16, UTF-8 will do just fine.

http://9tawan.net/en/

Thanks again for the support.

I won't need 2 languages at once. The language can be selectable from main menu or from launcher. Probably some languages will be available only in localized releases.

I am not so far into the project yet, but I thought to check it out and see how can I support it. Of course the localization (if any) will be done by professionals, but I want to have a way of displaying what they give.

From what you wrote, I think the idea of having a texture (or more) with all the glyphs is a good one, at least with exotic languages. I will probably use a library (like FreeType, or directly windows gdi) to render the text messages into texture when needed (text will not change per frame). This will also bring in some fps too (not so many small quads to draw each frame).

I'll just see more about it when I get there. Until then, I can set my own Font class to work as a renderer of text into texture, and building the engine like that I can easily switch to another font renderer.

Alex
Quote:Original post by xelanoimis
Does a language like german or french use more than 256 glyphs?
No, but getting German, French, and Swedish into 256 glyphs at the same time may become daunting.
German adds 7 "reasonable" glyphs (about a dozen if you count silly stuff like § too) glyphs to "normal" ASCII. French adds about 30 of these. Spanish has the same ones as French, plus 2 more. Nordic languages add another 5 or 6.
If you want Greek or Kyryllic, you're already somewhat in trouble. Plus, fitting it all into 256 slots isn't the only problem, you need a usable way to properly encode it all unambiguously, too.

Generally, if you do "mostly US and Europe" then you'll best just use UTF-8 and forget about all those problems. Map whatever character code you get to the respective quad in your font texture, or pass it to Freetype or whatever, it'll be good.
UTF-8 will just work perfectly well with almost all legacy "byte" functions (strlen being the only real exception, as it will return the byte count, which is not necessarily the character count), there's no new stuff to learn, and it will not consume noticeably more memory.

On the other hand, if you need Asian languages, you're pretty much screwed. Not only do they have a *lot* more characters and you'll have a much harder time finding glyphs, UTF-8 is also terribly inefficient for those, so you'll need to use something else (UTF-16, for example). Also, I've been told that some Asian languages (Japanese in particular) may be a real bitch to fit into limited-size fields of for example a GUI.
Many "non-standard" languages (that is, anything that isn't Latin or Germanic) have some really weird rules for plurals or cases too, so unless you can hire someone who is native in a particular language, you will probably end up with an abysmal localisation which in my opinion is worse than no localisation at all. It also makes the necessary description language for your texts more complicated.

As to "how do you deal with it", I'm only ever considering English and Latin or Germanic languages, and screw the rest, as everything else adds a lot of pain and very likely costs more than it could possibly return. This may be or may not be a good plan, but it's one that is affordable at least.
Thanks for the info!
It's good to know that other European languages can each fit in a texture.

As I said I wasn't planing to fit more than one language into one texture anyway. Also I wasn't planing to understand what is written in other languages :) Native professionals will handle it if it comes down to it. I just wanted to make sure I can use whatever input they provide.

This topic is closed to new replies.

Advertisement