Jump to content
  • Advertisement
Sign in to follow this  
bjogio

unicode consideration

This topic is 4901 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi gamedevelopers I want to modify the project (C++) I'm actually developing in order to switch from ansi to unicode. Now what is the best way to handle this process? One step at time, first problem: 1) What macro to use for handling "bla" L"bla", I've seen the Ms _T" " but I don't know if it is the most elegant way. 2) You use typedef for the standard container? such as:
#ifdef UNICODE
typedef tstringstream std::stringstream
#else
typedef tstringstream std::wstringstream
#endif

3) what macro for char and char string? It's better a typedef like "tchar"? 4) what encoding? I need my *.ini (on win systems) to handle unicode. I've seen that the application called "notepad" open the "unicode" "unicode big endian" and "UTF-8" encoding so I think that it would be a good idea to choose from these three. What is the most widely used? You can provide me a link with a signature list for all encoding? I'm talking about the first 4 bytes that in teory identify the used encoding. p.s. thanks to the guy how will loose time in answering my stupid question

Share this post


Link to post
Share on other sites
Advertisement
What is your target platform?
Under Windows just using _T() around your string literals, and TCHAR in your declarations can be enough to get it to compile for either.

Share this post


Link to post
Share on other sites
Windows uses little-endian unicode (Notepad just calls it "Unicode"). Files saved as big-endian you will have to convert using (WideChar&0xFF << 8 + (WideChar>>8) &0xFF) in your code (where WideChar is the widecharacter you read), and UTF-8 you will have to convert with MultiByteToWideChar(CP_UTF8, ...)

Share this post


Link to post
Share on other sites
UTF-8 is nice. You can store your text in normal strings, and, at least here in Linux with GCC, can be used with no extra work. Liek magik.

(There's the obvious problems of not being able to trust the string's length and inserting/removing characters with normal string functions can break them, but I won't tell if you won't.)

Share this post


Link to post
Share on other sites
mmh I think that the using of the little endian from Ms is due to the fact that (as I've seen) the last nt series uses it internally. I've decided to go with:
typedef tchar (char,wchar_t)
typedef tstring (string, wstring)
#define T() ((),L())
why the standard use the L" " before unicode literal? It means Long? It wasn't better S" " (Short)?
p.s. thanks

Share this post


Link to post
Share on other sites
Quote:
Original post by smart_idiot
UTF-8 is nice. You can store your text in normal strings, and, at least here in Linux with GCC, can be used with no extra work. Liek magik.

(There's the obvious problems of not being able to trust the string's length and inserting/removing characters with normal string functions can break them, but I won't tell if you won't.)


And the problem of not all international characters being represented, of some character varying from font to font, and so on and so on... ;)

But sure, if you can guarantee that your program will never be used by anyone outside the US/UK, UTF8 works fine. ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by Spoonbender
Quote:
Original post by smart_idiot
UTF-8 is nice. You can store your text in normal strings, and, at least here in Linux with GCC, can be used with no extra work. Liek magik.

(There's the obvious problems of not being able to trust the string's length and inserting/removing characters with normal string functions can break them, but I won't tell if you won't.)


And the problem of not all international characters being represented, of some character varying from font to font, and so on and so on... ;)


What are you on about? UTF-8 is a Unicode encoding, which allows for representing any Unicode code point (including those outside the BMP).

Share this post


Link to post
Share on other sites
UTF-8 is a unicode encoding using 8-bit characters. It maps basically directly to ASCII for the lower values, but it does some tricks when representing numbers that are beyond the 255 max value of an 8-bit character. The unicode standard character is stored as a 32-bit number, so a utf-8 character can also have upto (iirc) 3 extra sets of 8 bits (for a total of 4 8-bit characters) to represent other characters. It does have the advantage of not being affected by endian issues.

There is also UTF-16 (both big and little endian) and UTF-32 (also both big and little endian) that fall under the standard.

Unicode Website

You can even download the official unicode standard book there.

Share this post


Link to post
Share on other sites
UTF-8 isn't a codepage, if that is what you were thinking. Characters have variable length, ranging from 1-6 bytes. It can hold every possible unicode character.

I wrote some iterator adaptors for dealing with UTF-8 strings, to make life easier. Here is some code from my drawText function as an example:


UTF8::InputAdaptor<std::string::const_iterator> pos(string.begin(), string.end());
const UTF8::InputAdaptor<std::string::const_iterator> end(string.end());

for(; pos != end; ++pos)
{
const Font::Glyph &glyph(font.getGlyph(*pos, font_height));

if(x+glyph.width >= 0)
drawGlyph<bpp>(glyph, x, y, p, c);

if((x += glyph.width) >= width)
break;
}



Note that my adaptor can optionally take two extra iterators; one for one past the end, and one for the beginning. This is protection from malformed strings so it doesn't end up trying to read a multi-byte character and skipping over the beginning or end, which of course would be bad.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!