Jump to content
  • Advertisement
Sign in to follow this  
HellzGod

Percent encoding non-english characters - any Win32 api?

This topic is 3445 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I need to create a IUri object out of a character string which can contain non-english characters as well. The api CreateUri() fails and returns an error code E_FAIL when I pass a string containing non-english characters. I think they need to be percent encoded. InternetCanonicalizeUrl() and UrlEscape() both are useful only for converting unsafe characters. Is there any Win32 api for converting alpha numeric characters into the percent encoded form? Thanks, M

Share this post


Link to post
Share on other sites
Advertisement
Hi,
Looked up InternetCanonicalizeUrl(). Looks like it does encode non - English characters as well. Thanks a lot.

-M

Share this post


Link to post
Share on other sites
Hi,
I tried InternetCanonicalizeUrl() along with dwFlags = ICU_BROWSER_MODE. It doesn't convert non-english urls into % encoded form, though it does convert reserved characters into their corresponding encoding. I'm using VC++ and all my strings are WCHAR*.

Thanks,
M

Share this post


Link to post
Share on other sites
From my research, there isn't actually any standard way to encode Unicode characters in a URL; there are a few random implementations that do different things with Unicode characters, but no established common method. According to the RFCs there is no legal way to pass Unicode in a URL.

So the question becomes, why do you have this requirement? What exactly is supposed to be accomplished here?

Share this post


Link to post
Share on other sites
I'm trying to create a IUri object by passing a url string which could be in any laguage. I'm calling CreateUri() which fails for non-English characters. This is for a word processing app.

Thanks,
M

Share this post


Link to post
Share on other sites
Well, the really lazy way would be to reinterpret_cast the string to a char*, and loop through each byte, doing a simple conversion to % codes as you go (in case you aren't aware, the % codes are literally the byte values in hexadecimal). This will thoroughly mangle the URL and it probably won't work, because URIs are not intended to have characters outside a very limited subset of the Latin-1 codepage. In other words, if you create a URL this way, chances are whatever serves that URL will have no idea what you want [wink]

Share this post


Link to post
Share on other sites
Hi Apoch,
Thanks for the prompt replies. When I could not find any api, converting the characters to their corresponding hex value was what I tried doing. But I'm running into some weird issues. For eg: an Arabic character L'ت'is showing up as 1578 (decimal). This cannot be right. I'm passing the string as a CString and calling PercentEncode(strRaw). See anything wrong in what I'm doing?

Thanks,
M

Share this post


Link to post
Share on other sites
First, you need to know what encoding your input is using. Is it already UTF-8? UTF-16? Even UTF-32? What is the endianness of the encoding?

All that will affect how you do this.

(By the way, 1578 is the correct code point number for the character you posted. It smells like little-endian UTF-16 to me, which is pretty much what Windows does, but that's just a guess.)


Additionally - I'd recommend using ICU if you want good Unicode support from C/C++. It's a great library and handles (as well as explains) the finer points of how Unicode works.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!