C++: Compiling String Literals as UTF-8?

Started by
13 comments, last by chollida1 19 years, 1 month ago
(You may need to select (in Internet Explorer) view->encoding->More->Japanese (Shift-JIS) to see this post correctly, or you can ignore the "‚Í") Is there any way to get Microsoft Visual C++ 2005 Express (beta 1) to compile string literals as utf-8 code page? For example, char* Temp = "‚Í-ha"; Gives me a compile error that "‚Í" (ha in hirigana) does not exist in the current codepage. Well, how would I change the codepage it's compiling in so that the string literals use UTF-8? For the most part I use UTF-16 (wchar_t, L"Somestring"), but because the Windows API function GetProcAddress uses UTF-8, I need to use UTF-8. It seems extremely inefficient to have to use WideCharToMultiByte on a string that I know at compile time (and extremely annoying to have to put the gobly gook that is UTF-8 directly into the source code). Is there a way to get the compiler to do it?
----Erzengel des Lichtes光の大天使Archangel of LightEverything has a use. You must know that use, and when to properly use the effects.♀≈♂?
Advertisement
The only way I can think to do it is to figure out the byte sequence for the character you want, and build the character array manually (i.e not with a string literal).
I would recommend just putting the string data into an actual data file instead (and parsing it into several strings at startup).
Zahlman: GetProcAddress is to get functions and variables from a DLL. The strings that are the names are mostly going to be compile-time-constants. In fact, it would be a very Bad Thing(tm) if the user could somehow change the string constant (It would either break the program, or have every plug-in in the application use a different function in the DLLs than they were supposed to). Now of course, a sufficiently motivated modder could probably do it with a hex editor, but since a lot of this program is modifiable, anything out in a file is fair game for modding, and this is definitly not to be modified.
Zipster: I can input the UTF-8 code as a string literal, but it's giberish and unreadable. I was hoping the compiler had a way of doing it that maintained readability (well, readability to those of us that understand the language the code is written it).
----Erzengel des Lichtes光の大天使Archangel of LightEverything has a use. You must know that use, and when to properly use the effects.♀≈♂?
Well, if you know the byte values, you could do a "\xDE\xAD\xC0\xDE\x42" kind of thing.
I don't know of any way to keep the string readable =-/
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Nope, you can't. The compiler knows nothing of string encodings.
--God has paid us the intolerable compliment of loving us, in the deepest, most tragic, most inexorable sense.- C.S. Lewis
Quote:, but because the Windows API function GetProcAddress uses UTF-8,


This is incorrect, I'm not sure who told you this but you don't need to worry about converting to UTF-8.


You could also export by ordinal and then use GetProcAddresss by ordinal if you can't use an undecorated name.

Cheers
Chris
CheersChris
Quote:Original post by chollida1
Quote:, but because the Windows API function GetProcAddress uses UTF-8,


This is incorrect, I'm not sure who told you this but you don't need to worry about converting to UTF-8.


You could also export by ordinal and then use GetProcAddresss by ordinal if you can't use an undecorated name.

Cheers
Chris

Actually, you are incorrect, it does take UTF-8. See for yourself: Take a character that you can't make in ANSI: "Ž„‚̓tƒ@ƒ“ƒNƒVƒ‡ƒ“‚Å‚·" (again, Japanese(Shift-JIS)), and make it the function name of an exported function. Taken directly from the DLL's exp file: "灘ã¯ãƒ•ã‚¡ãƒ³ã‚¯ã‚·ãƒ˜ãƒ³ã˜ã™", which is UTF-8 (Try it. MultiByteToWideChar(CP_UTF8, 0, "灘ã¯ãƒ•ã‚¡ãƒ³ã‚¯ã‚·ãƒ˜ãƒ³ã˜ã™", -1, WideCharBuff, 13, 0, 0) will give you the correct characters. Or you could make a .htm file and use view->encoding->more->Unicode (UTF-8)). It just happens that UTF-8 and ANSI are the same when UTF-8 is single byte. So if you're using roman characters, you're fine with just "blah". If you're using characters from any other alphabet (japanese, hebrew, whatever), you need to convert to UTF-8. Or how would you suggest I convert my UTF-16 characters into single char so I can pass it to GetProcAddress? (Again, as I stated in the OP, GetProcAddress(HMODULE, LPCSTR) is a function, LPCSTR is char*, it is NOT a macro to GetProcAddressA and GetProcAddressW).
----Erzengel des Lichtes光の大天使Archangel of LightEverything has a use. You must know that use, and when to properly use the effects.♀≈♂?
Odd that msdn doesn't specify this, it says it takes an ansi string, my mistake:). My advice still stands, export by oridanl and the problem goes away:)

Again, I'm sorry for my mistake if that's the case!!

CHeers
Chris
CheersChris
Quote:Original post by chollida1
Odd that msdn doesn't specify this, it says it takes an ansi string.


Indeed, that is odd. In my the Platform SDK it simply doesn't specify the type of string, it just says
Quote:
lpProcName
[in] Pointer to a null-terminated string that specifies the function or variable name, or the function's ordinal value. If this parameter is an ordinal value, it must be in the low-order word; the high-order word must be zero.

That is why I looked at the exp file to see what it really exported. If you make a DLL with non-roman alphabets exported, compile, rename the exp file to an htm, open it in internet explorer, and set the encoding to UTF-8, all functions are listed correctly. I've also done the conversions with WideCharToMultiByte(CP_UTF8,...), and it worked flawlessly.
----Erzengel des Lichtes光の大天使Archangel of LightEverything has a use. You must know that use, and when to properly use the effects.♀≈♂?

This topic is closed to new replies.

Advertisement