Jump to content
  • Advertisement
Sign in to follow this  
DarkSlayer

UNICODE and string confusion...

This topic is 4904 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I want to move away from ansi, ascii char stuff 1: I want to use unicode so I can start getting used to a more international concept. but I am already confused with UTF-8, 16, 32, big endian etc etc chaos already... 2: should I use Win32 or c++ functions? is <tchar> functional enough for unicode tasks? or do they only move 8 or 16 bits of data and couldn't care less of content? 3: std::string or my own object? so little examples out there using tchar and std::string ... so little examples of anything... 4: As I compile using latest MS Platform SDK, I get error when I use sprintf to format my code ... So safety is nice ... but should I use the MS strsafe library? Maybe I should just make my own string object, but what should I include in it? I should be able to feed it with Ascii or Unicode chars ... I should be able to return ascii or unicode chars... The internals working could wery well be just unicode But should I use UTF-16? or 32? UTF-32 seems nice to use as internal format, but it will demand alot of conversion overhead all the time. I guess utf-16 is the native for windows ... big or little endian? How is linux dealing with unicode? How to deal with buffer owerflow and such stuff?

Share this post


Link to post
Share on other sites
Advertisement
It's really simple.

Use Unicode only if you intend to make your program multi-langual or allow support for multiple
languages in the future. If your program will only use English, now and forever, then don't use Unicode.

Share this post


Link to post
Share on other sites
maybe my fault...

I WILL USE UNICODE - period.

should I base it on <tchar> ?? or windows TCHAR?

I noticed that you had to define
#define _UNICODE // for windows
#define UNICODE // for <tchar>

any comment regarding using those two different libraries?
any comment on using little or big endian?

Share this post


Link to post
Share on other sites
Both are windows concepts. Neither are portable to other platforms. The TChar stuff is windows specific. THe theorey is portable but not the types.

wchar_t is 16 bits on windows and 32 on OS X for instance, allthough gcc will let you compile with the whcar-short flag, but then you can't use the C++stdlib on OS X as tis compiled to use wchar_t as 4 bytes.


If all you need is windows support then use the TCHAR defs. 10s of thousands of programs have been written usign them to successfully support unicode on the PC.

Cheers
Chris

Share this post


Link to post
Share on other sites
i would use the Win32 functions. they support UTF-16. so use unsigned short instead of char. you could easily write your own string class if you use the Win32 functions. just remember to use reference counting so that you can return a string from a function.

ex:
string Function0()
{
string ret = L"pizza";
return ret;
}

Share this post


Link to post
Share on other sites
TCHAR is for apps that want to do both Unicode and non-Unicode depending on how they're compiled. TCHAR is really just an alias for char or WCHAR depending on your settings.

If you want to be Unicode on Win32 then use WCHAR. WCHAR is a 16-bit little-endian type. You still need to define UNICODE and _UNICODE so that you end up calling Unicode versions of the API. In case you don't know, for any Win32 API that takes character data, say TextOut, there are really two versions - TextOutA and TextOutW. TextOut itself is just a #define that selects which of the 'A' and 'W' versions to use.

When using Unicode all of the standard C/C++ runtime functions like sprintf have Unicode equivalents. swprintf in this case. MSDN will tell you what the unicode version is and it will usually be basically the same name except with a 'w' in it someplace.

You can't really manipulate UTF-8 directly in Win32, pretty much none of the API's will take it as it's not a real code page in the Windows view of things. You can convert UTF8 data using MultiByteToWideString().

Share this post


Link to post
Share on other sites
Quote:
Original post by raydog
It's really simple.

If your program will only use English, now and forever, then don't use Unicode.


How do you know someone who doesn't have an english version of windows won't use your program?

Make your programs as simple as possible and no simpler.

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Share this post


Link to post
Share on other sites
Its important to note that windows uses UTF16 and this won't handle all unicode characters. At one time it was thought it would:). This really isn't microsofts fault as they did their unicode work back in 91 way before Mac and linux were thinking of this stuff.

Mac and linux use ucs4, (UTF32), This encoding does handle all possible unicode characters.

With UTF16 you'll need to handle surggates if you want to handle all characters.

CHeers
Chris

Share this post


Link to post
Share on other sites
You don't need to define a custom string type, the standard library string is a class template, std::string is just a type alias for the the real type std::basic_string, 2 of it's type parameters are the character type and character traits, there is also a type alias of basic_string for the wide character type wchar_t.

You can also do localization/internationalization in the standard IOstreams library, look up locales, facets, character traits in particular the facet std::std::codecvt/codecvt_byname preforms code conversation to and from the internal and external respenstation specified by the character encoding schemes.

You need to decide what your going to use for your internal representation, character type and character encoding scheme, you need to know what character type and character encoding scheme is used for the external repesentation.

Share this post


Link to post
Share on other sites
Quote:
Original post by petewood
How do you know someone who doesn't have an english version of windows won't use your program?


I'm not saying someone from Japan, for example, can't use my English-only program on their version of Windows.
They can. The text strings just won't be in Japanese.

What I'm saying is that I'm not going to waste any time and translate every English string into
10 different languages. If I was selling my program internationally, then perhaps I would, but
I'm not, so it really doesn't matter.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!