Jump to content
  • Advertisement
Sign in to follow this  
roos

Unicode strings

This topic is 4678 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I was looking for a way to represent unicode strings in my game so it can be localized. At first I was just using (wchar_t *), but it's very messy to manipulate raw strings, so I changed to using std::wstring. However, I've heard something about wstring not being supported on all platforms :( So, does anyone if anything already exists that is similar to wstring that I can plug into my code? It'd save me some time if I didn't have to write my own wstring class Thanks a lot, roos

Share this post


Link to post
Share on other sites
Advertisement
std::wstring is available on standards compliant C++ compilers. On the downside std::wstring may not be the same on all compilers. For example, on some compilers std::wstring is a string of 16 bit characters interpreted as UCS-2. On another compiler it may be a string of 32 bit characters interpreted as UTF-32BE. It's very difficult to write portable Unicode aware applications with just the Standard C++ library.

Personally for my Unicode needs I use IBM's ICU library. It provides a UnicodeString class, with all the trimmings, including normalization.

Share this post


Link to post
Share on other sites
Quote:
On the downside std::wstring may not be the same on all compilers.


There is a simple workaround, just make your own wstring type, ie.:
typedef std :: basic_string< int > wstring32;

As long as int will be 32 bits your strings will be UCS-4.

Hope that helps.

Share this post


Link to post
Share on other sites
That's less than half a solution. First off, even given int is 32 bits, it still leaves endianness undefined. Such a string would be either UTF-32LE or UTF-32BE depending on the underlying architecture, assuming your architecture is ASCII based to begin with. If it's EDBIC, then you need to do special processing to make sure that your string class is UCS-4 encoded. Not to mention that a UCS-4 encoded string requires special handling to initialize with in code string constants to begin with. That is to say you can use neither narrow character strings ("") nor wide character strings (L"") portably, since on platforms with 16 bit wchar_ts neither will be able to initialize the wstring32 class directly. Furthermore, despite using a standard library string container, you have practically no standard library support. For example, since char, wchar_t and int are seperate types even if they share the same size and signedness, you would then need to implement manually classes such as std::ctype<int>, as an implementation is only guaranteed to provide specializations for char and wchar_t.

Basically, using a basic_string specialization other than std::string or std::wstring (modulo allocators) guarantees no more portability than std::string or std::wstring.

Actually, I would extend the statement further: despite giving a uniform interface, the use of the standard library's localization features is inherently non-portable.

Share this post


Link to post
Share on other sites
Thanks for the help guys!

Hmm one thing though- so you're saying std::string is just as bad for portability as std::wstring? Because I was thinking "oh, I'll just use std::string for C-style strings and then something else for wide strings" but if you're saying wstring is as portable as string, then we can probably live with a little bit of non-portability since std::string is pretty darn portable afaik.

Thanks again,
roos

Share this post


Link to post
Share on other sites
Last I checked, std::string is just mapped to std::wstring if _UNICODE (or UNICODE? __UNICODE?) is defined. Otherwise it maps to std::astring

So most likely, it'll be the exact same thing.

The question is how portable do you want it to be?
Want it to work on all localized versions of Windows? (wstring should work then. I'm guessing it'd work with Linux on x86 as well)

Or do you want it to run on other architectures? PowerPC? A Mips cpu? Alpha? Various embedded stuff? Then you'll have to start worrying about all the messy stuff SiCrane mentioned.
Decide how much portability and localization (those two aren't really the same) you need, and then find a solution that offers that.

Share this post


Link to post
Share on other sites
Quote:
Original post by Spoonbender
Last I checked, std::string is just mapped to std::wstring if _UNICODE (or UNICODE? __UNICODE?) is defined. Otherwise it maps to std::astring


It shouldn't be. std::string should be std::basic_string<char>, regardless of any macro you may defined.

Share this post


Link to post
Share on other sites
Hmm, well at the moment we're targeting Windows, Linux and MacOS X, so maybe wstring is alright after all :D

Share this post


Link to post
Share on other sites
As soon as you add OS X to your list of target platforms you have to worry about byte ordering. If you want to store or read strings from disk, then you have to worry if your data is UCS-2, UTF-16LE or UTF-16BE. On Windows/x86 linux you're using little endian, and on PowerPC/Mac OS X you've got big endian. So you'll still need to worry about some things. Especially if you try using std::wfstream, where you'll probably need to imbue a std::codecvt to get file I/O to work halfway right.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!