Jump to content
  • Advertisement
Sign in to follow this  
Erzengeldeslichtes

C++: Compiling String Literals as UTF-8?

This topic is 4833 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

(You may need to select (in Internet Explorer) view->encoding->More->Japanese (Shift-JIS) to see this post correctly, or you can ignore the "‚Í") Is there any way to get Microsoft Visual C++ 2005 Express (beta 1) to compile string literals as utf-8 code page? For example, char* Temp = "‚Í-ha"; Gives me a compile error that "‚Í" (ha in hirigana) does not exist in the current codepage. Well, how would I change the codepage it's compiling in so that the string literals use UTF-8? For the most part I use UTF-16 (wchar_t, L"Somestring"), but because the Windows API function GetProcAddress uses UTF-8, I need to use UTF-8. It seems extremely inefficient to have to use WideCharToMultiByte on a string that I know at compile time (and extremely annoying to have to put the gobly gook that is UTF-8 directly into the source code). Is there a way to get the compiler to do it?

Share this post


Link to post
Share on other sites
Advertisement
The only way I can think to do it is to figure out the byte sequence for the character you want, and build the character array manually (i.e not with a string literal).

Share this post


Link to post
Share on other sites
I would recommend just putting the string data into an actual data file instead (and parsing it into several strings at startup).

Share this post


Link to post
Share on other sites
Zahlman: GetProcAddress is to get functions and variables from a DLL. The strings that are the names are mostly going to be compile-time-constants. In fact, it would be a very Bad Thing(tm) if the user could somehow change the string constant (It would either break the program, or have every plug-in in the application use a different function in the DLLs than they were supposed to). Now of course, a sufficiently motivated modder could probably do it with a hex editor, but since a lot of this program is modifiable, anything out in a file is fair game for modding, and this is definitly not to be modified.
Zipster: I can input the UTF-8 code as a string literal, but it's giberish and unreadable. I was hoping the compiler had a way of doing it that maintained readability (well, readability to those of us that understand the language the code is written it).

Share this post


Link to post
Share on other sites
Well, if you know the byte values, you could do a "\xDE\xAD\xC0\xDE\x42" kind of thing.
I don't know of any way to keep the string readable =-/

Share this post


Link to post
Share on other sites
Quote:
, but because the Windows API function GetProcAddress uses UTF-8,


This is incorrect, I'm not sure who told you this but you don't need to worry about converting to UTF-8.


You could also export by ordinal and then use GetProcAddresss by ordinal if you can't use an undecorated name.

Cheers
Chris

Share this post


Link to post
Share on other sites
Quote:
Original post by chollida1
Quote:
, but because the Windows API function GetProcAddress uses UTF-8,


This is incorrect, I'm not sure who told you this but you don't need to worry about converting to UTF-8.


You could also export by ordinal and then use GetProcAddresss by ordinal if you can't use an undecorated name.

Cheers
Chris

Actually, you are incorrect, it does take UTF-8. See for yourself: Take a character that you can't make in ANSI: "Ž„‚̓tƒ@ƒ“ƒNƒVƒ‡ƒ“‚Å‚·" (again, Japanese(Shift-JIS)), and make it the function name of an exported function. Taken directly from the DLL's exp file: "灘ã¯ãƒ•ã‚¡ãƒ³ã‚¯ã‚·ãƒ˜ãƒ³ã˜ã™", which is UTF-8 (Try it. MultiByteToWideChar(CP_UTF8, 0, "灘ã¯ãƒ•ã‚¡ãƒ³ã‚¯ã‚·ãƒ˜ãƒ³ã˜ã™", -1, WideCharBuff, 13, 0, 0) will give you the correct characters. Or you could make a .htm file and use view->encoding->more->Unicode (UTF-8)). It just happens that UTF-8 and ANSI are the same when UTF-8 is single byte. So if you're using roman characters, you're fine with just "blah". If you're using characters from any other alphabet (japanese, hebrew, whatever), you need to convert to UTF-8. Or how would you suggest I convert my UTF-16 characters into single char so I can pass it to GetProcAddress? (Again, as I stated in the OP, GetProcAddress(HMODULE, LPCSTR) is a function, LPCSTR is char*, it is NOT a macro to GetProcAddressA and GetProcAddressW).

Share this post


Link to post
Share on other sites
Odd that msdn doesn't specify this, it says it takes an ansi string, my mistake:). My advice still stands, export by oridanl and the problem goes away:)

Again, I'm sorry for my mistake if that's the case!!

CHeers
Chris

Share this post


Link to post
Share on other sites
Quote:
Original post by chollida1
Odd that msdn doesn't specify this, it says it takes an ansi string.


Indeed, that is odd. In my the Platform SDK it simply doesn't specify the type of string, it just says
Quote:

lpProcName
[in] Pointer to a null-terminated string that specifies the function or variable name, or the function's ordinal value. If this parameter is an ordinal value, it must be in the low-order word; the high-order word must be zero.

That is why I looked at the exp file to see what it really exported. If you make a DLL with non-roman alphabets exported, compile, rename the exp file to an htm, open it in internet explorer, and set the encoding to UTF-8, all functions are listed correctly. I've also done the conversions with WideCharToMultiByte(CP_UTF8,...), and it worked flawlessly.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!