C++: Compiling String Literals as UTF-8?
(You may need to select (in Internet Explorer) view->encoding->More->Japanese (Shift-JIS) to see this post correctly, or you can ignore the "‚Í")
Is there any way to get Microsoft Visual C++ 2005 Express (beta 1) to compile string literals as utf-8 code page?
For example,
char* Temp = "‚Í-ha";
Gives me a compile error that "‚Í" (ha in hirigana) does not exist in the current codepage. Well, how would I change the codepage it's compiling in so that the string literals use UTF-8?
For the most part I use UTF-16 (wchar_t, L"Somestring"), but because the Windows API function GetProcAddress uses UTF-8, I need to use UTF-8. It seems extremely inefficient to have to use WideCharToMultiByte on a string that I know at compile time (and extremely annoying to have to put the gobly gook that is UTF-8 directly into the source code). Is there a way to get the compiler to do it?
The only way I can think to do it is to figure out the byte sequence for the character you want, and build the character array manually (i.e not with a string literal).
I would recommend just putting the string data into an actual data file instead (and parsing it into several strings at startup).
Zahlman: GetProcAddress is to get functions and variables from a DLL. The strings that are the names are mostly going to be compile-time-constants. In fact, it would be a very Bad Thing(tm) if the user could somehow change the string constant (It would either break the program, or have every plug-in in the application use a different function in the DLLs than they were supposed to). Now of course, a sufficiently motivated modder could probably do it with a hex editor, but since a lot of this program is modifiable, anything out in a file is fair game for modding, and this is definitly not to be modified.
Zipster: I can input the UTF-8 code as a string literal, but it's giberish and unreadable. I was hoping the compiler had a way of doing it that maintained readability (well, readability to those of us that understand the language the code is written it).
Zipster: I can input the UTF-8 code as a string literal, but it's giberish and unreadable. I was hoping the compiler had a way of doing it that maintained readability (well, readability to those of us that understand the language the code is written it).
Well, if you know the byte values, you could do a "\xDE\xAD\xC0\xDE\x42" kind of thing.
I don't know of any way to keep the string readable =-/
I don't know of any way to keep the string readable =-/
Quote:, but because the Windows API function GetProcAddress uses UTF-8,
This is incorrect, I'm not sure who told you this but you don't need to worry about converting to UTF-8.
You could also export by ordinal and then use GetProcAddresss by ordinal if you can't use an undecorated name.
Cheers
Chris
Quote:Original post by chollida1Quote:, but because the Windows API function GetProcAddress uses UTF-8,
This is incorrect, I'm not sure who told you this but you don't need to worry about converting to UTF-8.
You could also export by ordinal and then use GetProcAddresss by ordinal if you can't use an undecorated name.
Cheers
Chris
Actually, you are incorrect, it does take UTF-8. See for yourself: Take a character that you can't make in ANSI: "Ž„‚̓tƒ@ƒ“ƒNƒVƒ‡ƒ“‚Å‚·" (again, Japanese(Shift-JIS)), and make it the function name of an exported function. Taken directly from the DLL's exp file: "ç˜ã¯ãƒ•ã‚¡ãƒ³ã‚¯ã‚·ãƒ˜ãƒ³ã˜ã™", which is UTF-8 (Try it. MultiByteToWideChar(CP_UTF8, 0, "ç˜ã¯ãƒ•ã‚¡ãƒ³ã‚¯ã‚·ãƒ˜ãƒ³ã˜ã™", -1, WideCharBuff, 13, 0, 0) will give you the correct characters. Or you could make a .htm file and use view->encoding->more->Unicode (UTF-8)). It just happens that UTF-8 and ANSI are the same when UTF-8 is single byte. So if you're using roman characters, you're fine with just "blah". If you're using characters from any other alphabet (japanese, hebrew, whatever), you need to convert to UTF-8. Or how would you suggest I convert my UTF-16 characters into single char so I can pass it to GetProcAddress? (Again, as I stated in the OP, GetProcAddress(HMODULE, LPCSTR) is a function, LPCSTR is char*, it is NOT a macro to GetProcAddressA and GetProcAddressW).
Odd that msdn doesn't specify this, it says it takes an ansi string, my mistake:). My advice still stands, export by oridanl and the problem goes away:)
Again, I'm sorry for my mistake if that's the case!!
CHeers
Chris
Again, I'm sorry for my mistake if that's the case!!
CHeers
Chris
Quote:Original post by chollida1
Odd that msdn doesn't specify this, it says it takes an ansi string.
Indeed, that is odd. In my the Platform SDK it simply doesn't specify the type of string, it just says
Quote:
lpProcName
[in] Pointer to a null-terminated string that specifies the function or variable name, or the function's ordinal value. If this parameter is an ordinal value, it must be in the low-order word; the high-order word must be zero.
That is why I looked at the exp file to see what it really exported. If you make a DLL with non-roman alphabets exported, compile, rename the exp file to an htm, open it in internet explorer, and set the encoding to UTF-8, all functions are listed correctly. I've also done the conversions with WideCharToMultiByte(CP_UTF8,...), and it worked flawlessly.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement