Using unicode character set, adding 'L' before strings, some stuff I don't understand
The Directx examples that come with the SDK are set to use the unicode character set and trying to compile them in any other situation results in errors about conversions (such as 'LPSTR' to 'LPCWSTR' or 'const char [29]' to 'LPCWSTR') and to solve it I have to add an L before strings, so "string" becomes L"string".
Why is this so?
My second question is how can I convert an LPSTR * to a LPCWSTR? Also LPSTR and LPCWSTR are typedefs so what do they stand for exactly?
The 'L' tells the compiler that you want a string literal of type wchar_t*, as opposed to a string literal of type char*. Why an 'L'? I've got no idea, perhaps it's meant to signify long characters. Whatever the reason, getting it wrong means a world of hurt since there's no automatic conversion between strings of type wchar_t and char. IMHO string literals should default to either wchar_t or char according to a compiler option, but alas it's not so.
As for what the typedefs and some other related macros stand for:
LPSTR = Long Pointer to a STRing (char*)
LPCSTR = Long Pointer to a Constant STRing (const char*)
LPWSTR = Long Pointer to a Wide STRing (wchar_t*)
LPCWSTR = Long Pointer to a Constant Wide STRing (const wchar_t*)
LPTSTR = Long Pointer to a TCHAR STRing (TCHAR*)
LPCTSTR = Long Pointer to a Constant TCHAR STRing (const TCHAR*)
TCHAR = either char or wchar_t, depending on whether UNICODE is defined
TEXT("My String") = either "My String" or L"My String", depending on whether UNICODE is defined
As for what the typedefs and some other related macros stand for:
LPSTR = Long Pointer to a STRing (char*)
LPCSTR = Long Pointer to a Constant STRing (const char*)
LPWSTR = Long Pointer to a Wide STRing (wchar_t*)
LPCWSTR = Long Pointer to a Constant Wide STRing (const wchar_t*)
LPTSTR = Long Pointer to a TCHAR STRing (TCHAR*)
LPCTSTR = Long Pointer to a Constant TCHAR STRing (const TCHAR*)
TCHAR = either char or wchar_t, depending on whether UNICODE is defined
TEXT("My String") = either "My String" or L"My String", depending on whether UNICODE is defined
There are two types of character sets "Multi Byte" and "Unicode". The difference between the two is that "Multi-Byte" is your standard day-to-day ASCII 255 bit patterned format, and "Unicode" which is an industry standard that allows text/symbols from all world language systems to be displayed. Before "Unicode", this was impossible due to the 255 ASCII bit pattern limit and operating system.
LPSTR and LPCWSTR and so on are Hungarian Notation typedefs. LPSTR stands for (pointer to a multi-byte string) and LPWSTR stands for a (pointer to a unicode string).
There are some macros that may be of use to you located in the header file
<tchar.h> such as the _T macro which converts a string to proper format based
on the compilers character set, and so on.
You can read up and learn more about these topics here:
Unicode
ASCII
Hungarian Notation
MSDN Unicode Character Sets
LPSTR and LPCWSTR and so on are Hungarian Notation typedefs. LPSTR stands for (pointer to a multi-byte string) and LPWSTR stands for a (pointer to a unicode string).
There are some macros that may be of use to you located in the header file
<tchar.h> such as the _T macro which converts a string to proper format based
on the compilers character set, and so on.
You can read up and learn more about these topics here:
Unicode
ASCII
Hungarian Notation
MSDN Unicode Character Sets
Quote:Original post by Shadow Wolf
There are two types of character sets "Multi Byte" and "Unicode". The difference between the two is that "Multi-Byte" is your standard day-to-day ASCII 255 bit patterned format, and "Unicode" which is an industry standard that allows text/symbols from all world language systems to be displayed. Before "Unicode", this was impossible due to the 255 ASCII bit pattern limit and operating system.
Multibyte (or UTF-8) allows that too. It just achieves it a bit more awkwardly.
Multibyte, as the name implies, uses multiple bytes to represent special characters. All the 127 ASCII characters are represented by a single byte (so your ascii strings will look *exactly* the same in UTF8-encoding), and everything else will use two or more bytes, starting with a value above 127, to keep it separate from the ASCII chars.
The ASCII compatibility is nice in some situations, but this scheme also makes it very hard to figure out the number of characters in a string and other operations. (You need to iterate through the string to tell where each character begins and ends)
Whilst you might have come across this via the DXSDK it's not really a DirectX-specific issue; I'm going to move this to 'General Programming' where it's more suited [smile]
Cheers,
Jack
Quote:how can I convert an LPSTR * to a LPCWSTR?Have a look into MultiByteToWideChar() (it's opposite function is linked at the bottom of that page).
Cheers,
Jack
Thanks for explaining everyone, I was pretty confused as to why they have seperate character sets.
Quote:Original post by Shadow WolfNot at all.
The difference between the two is that "Multi-Byte" is your standard day-to-day ASCII 255 bit patterned format
Multi-Byte normally referes to the extensions to ASCII designed to cover huge character sets by using variable length encoding.
On Windows platform you may meet Big5 for Chinese and Shift JIS for Japanese.
If you do not care about Win98 — you don't have to use Multi-Byte strings in your application.
Quote:Original post by Shadow WolfWell, those multi-byte encodings were developed exactly to do that before Unicode was even born.
Before "Unicode", this was impossible due to the 255 ASCII bit pattern limit and operating system.
Quote:Original post by SpoonbenderEven if some functions, that process multi-byte strings may deal with UTF-8, it's not fully supported — UTF-8 cannot be set as system locale.
Multibyte (or UTF-8) allows that too.
Quote:Original post by Spoonbender…or you may store with the string its length in characters.
The ASCII compatibility is nice in some situations, but this scheme also makes it very hard to figure out the number of characters in a string and other operations. (You need to iterate through the string to tell where each character begins and ends)
Quote:Original post by Serge KQuote:Original post by Shadow WolfNot at all.
The difference between the two is that "Multi-Byte" is your standard day-to-day ASCII 255 bit patterned format
Multi-Byte normally referes to the extensions to ASCII designed to cover huge character sets by using variable length encoding.
On Windows platform you may meet Big5 for Chinese and Shift JIS for Japanese.
If you do not care about Win98 — you don't have to use Multi-Byte strings in your application.
I think he may have just been referring to what its called in Visual Studio. The only two choices are "Multi-byte character set" and "Unicode" - and in this case the MBCS does just yield you regular ASCII chars ( I've never done anything with any extended characters so I'll defer to your knowledge on that ). But thanks for clearing that up - I'd always wondered why it was called a multi-byte character set when a char was one byte.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement