Using UTF-8 in Games

Published December 09, 2013
Advertisement
[font=arial]

Using UTF-8 in Games

[/font]

[font=arial]

After working on my GUI system, I really started to explore using UTF-8 more in my game. I'm currently living in Korea and that has made me see how important it is to give people games in their language. The kids I teach are just crazy about video games. The game Starcraft, for example, was at one time so popular in Korea, it spawned sponsored professional game players. Matches can be even seen on game-specific cable television networks. There is a large international desire for games and developers should build their games with this in mind.

[/font]

[font=arial]

I've started looking at UTF-8 these days and trying to explore it. I've written an article on it and I've even made a video presentation based on that article. And if you like want source code, on my other blog, I'm currently writing a series Writing a STL-Style UTF-8 String Class. It will be a five part series that I should finish by Friday. If it gets a nice reception, I'll post a summary article here on Gamedev. Even with all of this information, plus the countless articles on the Internet, how can someone actually use UTF-8 in a game?

[/font]

[font=arial]

UTF-8 is a Unicode encoding. One good thing about UTF-8 is it's a variable-sized 8 bit encoding. This means that games developed using 7 bit ASCII(the 8 bit "char" data type in C and C++) will not need to alter their file formats that store text. This is because 7-bit ASCII is a subset of UTF-8 and UTF-8 can work with null-terminated c-style strings. All of the standard "char"-based string functions will work including search, concatenation, and comparison.

[/font]
[font=arial]

To use UTF-8 with a GUI, the text can be stored in UTF-8 and then converted during rendering. This can be done by iterating through the string to get the correct code point. For GUIs that render text that doesn't need further processing, this will work well. Edit boxes will be a little trickier since they will require random insertions of characters anywhere in the text field. This can be done using a good UTF-8 string class, but it would be better to not use UTF-8 in this case and convert later. This makes sense as an edit control will probably have it's own buffer for text anyway that will later be synchronized with the application.

[/font]

[font=arial]

Another thing UTF-8 is good for is sending data. Even when a game already uses wchar_t(16 bits on Windows), UTF-8 may still be used when communicating with other programs over a network. Many network libraries and many servers require request to be sent in JSON or XML encoded in UTF-8 so even if the program stores it's text internally as wchar_t, conversion to UTF-8 may still be needed. UTF-8 is good for sending data because it is endian-independent. One example of this is the RakNet master server implementation by Jenkins Software. To use it, applications should send data in JSON over HTTP. User names, chat messages, and room details can all be encoded in UTF-8.

[/font]

[font=arial]

I'll continue to post about my progress and as I get more information I'll also post it here.

[/font]
5 likes 11 comments

Comments

Krohm

Very interesting. I'm personally surprised you're going to work on edit support; finding out the insertion point (potentially) requires consistent support for Unicode layout. Just thinking at it makes my head hurt.

Personally I use UTF8 for serialization purposes only.

December 09, 2013 07:12 AM
Squared'D

You're right. No one is going to make a living trying to edit a UTF-8 string. Just referencing a character by index is an O(n) function compared with O(1) with usually a fixed-size type. The most useful methods I think this class will have will be the string operations like copy, find, rfind, find_first_of, find_last_of, substr, and compare. I've already written the basic UTF-8 utility functions that I'll need to put together the string class. Some parts of the class may be a little overkill, but it's a fun little project.

December 09, 2013 01:59 PM
snisarenko
Never mind, I re-read the edit portion of the article
December 11, 2013 12:52 AM
Squared'D
I definitely wouldn'ttry to continously edit a UTF-8 string. I'm providing the capability in my utf8string class solely becauseI want it to be as much like std::string as possible. That's the goal of my little exercisie, but I'm also providing stand-a-lone functions that don't have the STL overhead.
December 11, 2013 02:29 AM
Squared'D

BTW, the first version of the utf8string class is available at Writing an STL-Style UTF-8 String. In the next post, I'll finish the class and I'll probably upload it to Sourceforge

December 11, 2013 03:04 AM
NightCreature83

Editing UTF is easier if you do it in UTF32, as all chars are fixed with, then for storage convert to utf8 and vice versa. It costs more but you aren't editing a string often in a game. And for mobile or console games this stuff is even less of a hassle because they supply edit boxes in the OS which you should just invoke and ask the string from that :)

December 12, 2013 08:29 AM
Squared'D

Editing UTF is easier if you do it in UTF32, as all chars are fixed with, then for storage convert to utf8 and vice versa. It costs more but you aren't editing a string often in a game. And for mobile or console games this stuff is even less of a hassle because they supply edit boxes in the OS which you should just invoke and ask the string from that smile.png

I agree. This is what I do too.

BTW, here's the completed string class. It has all of the std::string methods minus some of the overloads. I'm still working on the class and hope to have it on Source Forge or something similar within a week or two. It was a fun little side project for me. I feel that I know STL much more intimately now.

http://squaredprogramming.blogspot.kr/2013/12/writing-stl-style-utf-8-string-class5.html

December 13, 2013 12:29 PM
codenine75a

Try windows IME.

December 17, 2013 02:23 PM
Squared'D

Try windows IME.


Windows IME is used for entering text whereas UTF-8 is an encoding for transmitting and storing text. The two can be used together.

December 18, 2013 01:35 AM
Squared'D

I've uploaded the code to GitHub. You can check there for the latest updates.

http://squaredprogramming.blogspot.kr/2013/12/utf8stringonGithub.html

December 20, 2013 12:32 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement
Advertisement