Sign in to follow this  

Using unicode character set, adding 'L' before strings, some stuff I don't understand

This topic is 3857 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The Directx examples that come with the SDK are set to use the unicode character set and trying to compile them in any other situation results in errors about conversions (such as 'LPSTR' to 'LPCWSTR' or 'const char [29]' to 'LPCWSTR') and to solve it I have to add an L before strings, so "string" becomes L"string". Why is this so? My second question is how can I convert an LPSTR * to a LPCWSTR? Also LPSTR and LPCWSTR are typedefs so what do they stand for exactly?

Share this post


Link to post
Share on other sites
The 'L' tells the compiler that you want a string literal of type wchar_t*, as opposed to a string literal of type char*. Why an 'L'? I've got no idea, perhaps it's meant to signify long characters. Whatever the reason, getting it wrong means a world of hurt since there's no automatic conversion between strings of type wchar_t and char. IMHO string literals should default to either wchar_t or char according to a compiler option, but alas it's not so.

As for what the typedefs and some other related macros stand for:

LPSTR = Long Pointer to a STRing (char*)
LPCSTR = Long Pointer to a Constant STRing (const char*)
LPWSTR = Long Pointer to a Wide STRing (wchar_t*)
LPCWSTR = Long Pointer to a Constant Wide STRing (const wchar_t*)
LPTSTR = Long Pointer to a TCHAR STRing (TCHAR*)
LPCTSTR = Long Pointer to a Constant TCHAR STRing (const TCHAR*)
TCHAR = either char or wchar_t, depending on whether UNICODE is defined

TEXT("My String") = either "My String" or L"My String", depending on whether UNICODE is defined

Share this post


Link to post
Share on other sites
There are two types of character sets "Multi Byte" and "Unicode". The difference between the two is that "Multi-Byte" is your standard day-to-day ASCII 255 bit patterned format, and "Unicode" which is an industry standard that allows text/symbols from all world language systems to be displayed. Before "Unicode", this was impossible due to the 255 ASCII bit pattern limit and operating system.

LPSTR and LPCWSTR and so on are Hungarian Notation typedefs. LPSTR stands for (pointer to a multi-byte string) and LPWSTR stands for a (pointer to a unicode string).

There are some macros that may be of use to you located in the header file
<tchar.h> such as the _T macro which converts a string to proper format based
on the compilers character set, and so on.

You can read up and learn more about these topics here:
Unicode
ASCII
Hungarian Notation
MSDN Unicode Character Sets

Share this post


Link to post
Share on other sites
Quote:
Original post by Shadow Wolf
There are two types of character sets "Multi Byte" and "Unicode". The difference between the two is that "Multi-Byte" is your standard day-to-day ASCII 255 bit patterned format, and "Unicode" which is an industry standard that allows text/symbols from all world language systems to be displayed. Before "Unicode", this was impossible due to the 255 ASCII bit pattern limit and operating system.

Multibyte (or UTF-8) allows that too. It just achieves it a bit more awkwardly.
Multibyte, as the name implies, uses multiple bytes to represent special characters. All the 127 ASCII characters are represented by a single byte (so your ascii strings will look *exactly* the same in UTF8-encoding), and everything else will use two or more bytes, starting with a value above 127, to keep it separate from the ASCII chars.
The ASCII compatibility is nice in some situations, but this scheme also makes it very hard to figure out the number of characters in a string and other operations. (You need to iterate through the string to tell where each character begins and ends)

Share this post


Link to post
Share on other sites
Whilst you might have come across this via the DXSDK it's not really a DirectX-specific issue; I'm going to move this to 'General Programming' where it's more suited [smile]

Quote:
how can I convert an LPSTR * to a LPCWSTR?
Have a look into MultiByteToWideChar() (it's opposite function is linked at the bottom of that page).

Cheers,
Jack

Share this post


Link to post
Share on other sites
Quote:
Original post by Shadow Wolf
The difference between the two is that "Multi-Byte" is your standard day-to-day ASCII 255 bit patterned format
Not at all.
Multi-Byte normally referes to the extensions to ASCII designed to cover huge character sets by using variable length encoding.
On Windows platform you may meet Big5 for Chinese and Shift JIS for Japanese.
If you do not care about Win98 — you don't have to use Multi-Byte strings in your application.

Quote:
Original post by Shadow Wolf
Before "Unicode", this was impossible due to the 255 ASCII bit pattern limit and operating system.
Well, those multi-byte encodings were developed exactly to do that before Unicode was even born.

Quote:
Original post by Spoonbender
Multibyte (or UTF-8) allows that too.
Even if some functions, that process multi-byte strings may deal with UTF-8, it's not fully supported — UTF-8 cannot be set as system locale.

Quote:
Original post by Spoonbender
The ASCII compatibility is nice in some situations, but this scheme also makes it very hard to figure out the number of characters in a string and other operations. (You need to iterate through the string to tell where each character begins and ends)
…or you may store with the string its length in characters.

Share this post


Link to post
Share on other sites
Quote:
Original post by Serge K
Quote:
Original post by Shadow Wolf
The difference between the two is that "Multi-Byte" is your standard day-to-day ASCII 255 bit patterned format
Not at all.
Multi-Byte normally referes to the extensions to ASCII designed to cover huge character sets by using variable length encoding.
On Windows platform you may meet Big5 for Chinese and Shift JIS for Japanese.
If you do not care about Win98 — you don't have to use Multi-Byte strings in your application.

I think he may have just been referring to what its called in Visual Studio. The only two choices are "Multi-byte character set" and "Unicode" - and in this case the MBCS does just yield you regular ASCII chars ( I've never done anything with any extended characters so I'll defer to your knowledge on that ). But thanks for clearing that up - I'd always wondered why it was called a multi-byte character set when a char was one byte.

Share this post


Link to post
Share on other sites

This topic is 3857 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this