Back to General and Gameplay Programming

Percent encoding non-english characters - any Win32 api?

General and Gameplay Programming Programming

Started by HellzGod January 10, 2009 02:27 PM

12 comments, last by ApochPiQ 15 years, 3 months ago

HellzGod

122

Author

January 10, 2009 02:27 PM

Hi, I need to create a IUri object out of a character string which can contain non-english characters as well. The api CreateUri() fails and returns an error code E_FAIL when I pass a string containing non-english characters. I think they need to be percent encoded. InternetCanonicalizeUrl() and UrlEscape() both are useful only for converting unsafe characters. Is there any Win32 api for converting alpha numeric characters into the percent encoded form? Thanks, M

ApochPiQ

23,138

January 10, 2009 04:40 PM

What type/class are you using for your strings? (Also, what language are you writing in?)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Colin Jeanne

1,115

January 10, 2009 04:43 PM

It sounds like InternetCanonicalizeUrl() will automatically encode all characters not in the US-ASCII character set from this document.

HellzGod

122

Author

January 11, 2009 05:35 AM

Hi,
Looked up InternetCanonicalizeUrl(). Looks like it does encode non - English characters as well. Thanks a lot.

-M

HellzGod

122

Author

January 11, 2009 07:08 AM

Hi,
I tried InternetCanonicalizeUrl() along with dwFlags = ICU_BROWSER_MODE. It doesn't convert non-english urls into % encoded form, though it does convert reserved characters into their corresponding encoding. I'm using VC++ and all my strings are WCHAR*.

Thanks,
M

ApochPiQ

23,138

January 11, 2009 09:29 AM

From my research, there isn't actually any standard way to encode Unicode characters in a URL; there are a few random implementations that do different things with Unicode characters, but no established common method. According to the RFCs there is no legal way to pass Unicode in a URL.

So the question becomes, why do you have this requirement? What exactly is supposed to be accomplished here?

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

HellzGod

122

Author

January 11, 2009 09:52 AM

I'm trying to create a IUri object by passing a url string which could be in any laguage. I'm calling CreateUri() which fails for non-English characters. This is for a word processing app.

Thanks,
M

ApochPiQ

23,138

January 11, 2009 10:02 AM

Well, the really lazy way would be to reinterpret_cast the string to a char*, and loop through each byte, doing a simple conversion to % codes as you go (in case you aren't aware, the % codes are literally the byte values in hexadecimal). This will thoroughly mangle the URL and it probably won't work, because URIs are not intended to have characters outside a very limited subset of the Latin-1 codepage. In other words, if you create a URL this way, chances are whatever serves that URL will have no idea what you want [wink]

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

HellzGod

122

Author

January 11, 2009 01:24 PM

Hi Apoch,
Thanks for the prompt replies. When I could not find any api, converting the characters to their corresponding hex value was what I tried doing. But I'm running into some weird issues. For eg: an Arabic character L'ت'is showing up as 1578 (decimal). This cannot be right. I'm passing the string as a CString and calling PercentEncode(strRaw). See anything wrong in what I'm doing?

Thanks,
M

ApochPiQ

23,138

January 11, 2009 07:14 PM

First, you need to know what encoding your input is using. Is it already UTF-8? UTF-16? Even UTF-32? What is the endianness of the encoding?

All that will affect how you do this.

(By the way, 1578 is the correct code point number for the character you posted. It smells like little-endian UTF-16 to me, which is pretty much what Windows does, but that's just a guess.)

Additionally - I'd recommend using ICU if you want good Unicode support from C/C++. It's a great library and handles (as well as explains) the finer points of how Unicode works.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Percent encoding non-english characters - any Win32 api?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Percent encoding non-english characters - any Win32 api?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines