Sign in to follow this  
Juliean

Game engine unicode and multibyte support

Recommended Posts

Hello,

 

in my game engine, I want to support both unicode and multibyte, based on what the user has selected. Up until now, I've been using std::wstring and wchar_t everywhere directly. So the idea I came up with was instead have my own String.h-file, which would look something like that:

#pragma once
#include <string>

namespace acl
{

#ifdef UNICODE
	typedef std::wstring String;
	typedef wchar_t char_t;
#else
	typedef std::string String;
	typedef char char_t;
#endif

}

Then I would use the new typedefs accordingly. Does that sound reasonable? Since this is more or less a huge change throughout the whole projects, I'd like to hear some opinions before applying it. Maybe someone even got a better solution...?

Share this post


Link to post
Share on other sites

Unicode does not imply wide characters, and wide characters does not imply Unicode. Unicode is, sort of, a map of code points to characters. A code point, however, can be encoded in many different ways. For example, UTF-8 encodes code points in a variable number of 8-bit units, so your std::string can be used to store Unicode in UTF-8. So one of your code paths there is no more Unicode than the other, it just switches between possible encodings of the code points.

 

In my limited experience in proper text handling, my suggestion is that you should avoid mixing encodings within your application as much as possible! Stick with one single encoding for your application, but provide interfaces that allows the user to transport text to and from your library using multiple encodings. For example, you can keep your internal strings as a sequence of 32-bit code points (basically typedef std::vector<unsigned int> MyString), and design your code around that. But feel free to provide conversion functions that lets the user supply, or read back, strings as UTF-8, 7-bit ASCII, extended ASCII, UTF-16 or whatever encoding you which to support. But, unless you have good reason to, don't mix encodings within your own code.

Share this post


Link to post
Share on other sites

Also wchar_t is compiler / implementation / platform specific. It may be 32 bits wide, 16 bits wide, or even (although improbably) 8 bits wide. If going with the 2011 standards, the char16_t or char32_t types should be used when looking at a defined size.

Share this post


Link to post
Share on other sites

The engine doesn't need it.  It needs to work with keys that map to displayable stuff.

 

The UI should have a class called LocalizedString.  You can create a localized string by passing in keys and looking it up in the localization database.  Internally the LocalizedString object might have a UTF8, extended ascii, 16-bit, 32-bit, or whatever other value-to-glyph representation you need; that should be isolated from everybody in the code, passing around LocalizedString objects instead.

 

+1, very solid advice. I have seen quite some abominations in the past, where strings of various kinds were passed around between engine, GUI and other systems. Sooner or later, somebody will break the system by introducing hard-coded text, text that cannot be properly localized, etc.

 

It really is best to have a single class for localized text (such as the LocalizedString proposed by frob), and ban all other strings from the engine completely. 99% percent of the time you do not need strings at runtime - the only exception being localized text.

Share this post


Link to post
Share on other sites

Thanks all, quite a different and more in-depth answer than I expected, but sounds interesting indeed.

 

But just to heads up, and for me to be entirely sure, you are suggesting to use this LocalizedString-class throughout all of my engine, including stuff like looking up gui widgets by name, to my xml-parser and xml-node-classes to access child nodes? If so, can somebody point me out to any reference/tutorial/guide on how to implement such a class? I'm a little weak on manual text encoding/handling, I think I need at least some sort of intro to the whole topic... Thanks!

Share this post


Link to post
Share on other sites


But just to heads up, and for me to be entirely sure, you are suggesting to use this LocalizedString-class throughout all of my engine, including stuff like looking up gui widgets by name, to my xml-parser and xml-node-classes to access child nodes?

No. Only strings that will be displayed to the user need to be localized.

 

Internal lookup strings should always be in whatever your engines native string encoding is (typically either plain ascii, or utf8).

Share this post


Link to post
Share on other sites


Internal lookup strings should always be in whatever your engines native string encoding is (typically either plain ascii, or utf8).

 

Ok, in that matter I quess I should have rephrased my question, since I was originally more interested in those then the others (though I'm still going to apply the idea of the localized strings, it seems like a decent addition). My original question originated partly from an incident where I handed part of my libary (the windows application wrapper) to a fellow student, who had to rewrite the whole thing since they where using ascii setting in Visual studio and the rest of their appliaction, while I was using utf8. All in all I thought having the user/IDE choose was beneficial, so you'd rather advice me to go with eigther one of these settings? Utf8 that is, in my case...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this