Jump to content
  • Advertisement
Sign in to follow this  
roos

Sorting unicode strings

This topic is 4848 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I'm using unicode strings (basic_string<uint16>) for my game so it can be localized to any language. I was wondering, is there any good way to compare two unicode strings to see which one is alphabetically larger? With regular strings, I could just use strcmp() and check if it's greater than zero or whatever. With unicode strings I'm not sure... If I knew how to compare two characters to see which one is bigger, then writing the compare function is easy enough... Is it okay to just compare the values of the characters to see which one is bigger? In English I think that would work, because afaik the unicode value for "d" is less than the value for "z" for example. In Japanese I don't know how to go about it, since you might have two different Kanji with different unicode values, but the same phoenetic sound. Hmm, I'm going to spend a little more time looking into this, but if anyone might know offhand if there's a proper way to compare unicode strings, I would really appreciate it if you could give me a hint! Thanks, roos

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by roos
I'm using unicode strings (basic_string<uint16>) for my game so it can be localized to any language.


Just using a wide string type doesn't magically make them Unicode strings, you need to use a Unicode locale too. This is what you need to start doing research on.

Quote:
I was wondering, is there any good way to compare two unicode strings to see which one is alphabetically larger?


Use the std::collate facet of the appropriate locale.

Quote:
Is it okay to just compare the values of the characters to see which one is bigger?


It depends on whether you want the comparison to be meaningful (dictionary order) or just be able to sort strings in an arbitrary order e.g. to do a binary search.

Quote:
In English I think that would work, because afaik the unicode value for "d" is less than the value for "z" for example.


And yet you still have to deal with the fact that the value for 'Z' is less than the value for 'a'.

Quote:
In Japanese I don't know how to go about it, since you might have two different Kanji with different unicode values, but the same phoenetic sound.


unicode.org
Collation

Share this post


Link to post
Share on other sites
Ah the joy of collation. Alphabetical sorts don't work well with unicode strings. Even limited to the realm of just a-z, you have languages like Lithuanian which put letters in different order than the ASCII ordering. On the plus side the Unicode organization has a document which describes how to do collation on Unicode strings: UTS #10: Unicode Collation Algorithm. On the minus side it's somewhat annoying to implement.

If you want something faster, but still handles many of the annoying corner cases, you may consider first normalizing the string and then doing a lexigraphic ordering based on the numeric value of code points. Depending on your appication different choices many be made on the normalization. NFKC or NFKD would give you less trouble with things like ligatures, but if you may conisder NFC or NFD if rendering information is important.

Share this post


Link to post
Share on other sites
Wow, I see it's a lot more complicated than I thought! Well, that definitely helps a lot because now I have some links and some terms I didn't know about like "collation". Thanks a ton Fruny and SiCrane for your help, I appreciate it very much.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!