Sign in to follow this  
Devnull

UTF8 - SJIS conversion

Recommended Posts

I'm annoyed with this particular set of circumstances, but nonetheless, it is what it is and so I need to convert some strings that come in as UTF8 to Shift-JIS (SJIS) encoded strings. I have a conversion routine from ASCII to SJIS and back, but not from/to UTF8. I've done a number of Google searches for appropriate code but haven't been able to find anything useful. Does anyone know of code that will convert UTF8 to SJIS in C++ (or C.. Really, I just need a relatively simple code snippet that doesn't need to pull in 50 other files in some random library) Any help appreciated! Thanks!

Share this post


Link to post
Share on other sites
There is no such thing as conversion from SJIS to ASCII, because ASCII doesn't represent Japanese characters. What does your conversion *really* do?

Anyway, it's pretty straightforward to do if you are comfortable shifting bits around and if you are careful to distinguish bytes from characters.

The process, basically, is:

- Read bytes one at a time from the input stream, examining the high few bits of each, until your examination determines that you have enough bytes to make up a UTF8 character.
- Translate the bytes into the Unicode code point, by shifting and masking.
- Look up the code point in a translation table, which gives you the bytes representing that character (creating the table is left as an exercise*) - either one or two (you can represent the single byte characters by specifying an "illegal in shift-JIS" value for the second byte).
- Append the retrieved byte(s) (omitting the illegal second byte if necessary) to the output stream.

* You might find this and this useful.

[Edited by - Zahlman on October 9, 2007 2:35:09 AM]

Share this post


Link to post
Share on other sites
The ASCII <-> SJIS conversion that I have is admittedly pretty lame - it just ignores any non-ASCII character. I doubt it would work for what I need anyway. The algorithm to do a converter is pretty simple, I realize. I even have code that will find the codepoints automatically in a UTF8 string. I just need to then do a lookup in a table to get the appropriate SJIS character.

What I'm honestly trying to do is save myself a bit of typing and use some code that someone else has already typed in. i.e. I'm being lazy and trying to do the least amount of work to get this done. The reason I'm leaning towards laziness on this one is that Sony decided that the PS2 memory card struct decided HAD to use SJIS instead of UTF8 and the rest of our game (and most of the industry so far as I can tell) has standardized on UTF8. Thus I need to convert 2 freaking lines of text from UTF8 to SJIS (for each locale we ship in) in order to save the game properly. I'm thus trying to see if I can find some ready-made code so I can avoid typing in some large translation table and spending the several hours to get things working correctly. After searching Google and finding nothing useful, I decided to ask here. Since I'm not getting anything here either, I guess I'll just have to bite the bullet and type in the code. C'est la vie, I guess :)

Share this post


Link to post
Share on other sites
If you're willing to be Windows specific you should be able to do this by calling MultiByteToWideChar with CP_UTF8 to convert the UTF8 string to UTF16 and then call WideCharToMultiByte with code page 932 to convert that to shift-jis.

Share this post


Link to post
Share on other sites
Quote:
Original post by Devnull
Considering that I'm writing this code for the PS2, it would be difficult to be Windows-specific :P


At the risk of asking the obvious, have you looked in the SCE sample sources? I don't have access to them any more, but the PS2 projects I worked on obviously had functions to do the jobs you're talking about, and I'm pretty sure that they were copy-and-pasted from somewhere we found easily. The memory card sample code springs to mind!

Share this post


Link to post
Share on other sites
Yeah, I did look there, but nothing obvious jumped at me. It's quite possible that I missed the "obvious", however. I'll look again. Thanks for the tip :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Devnull
Yeah, I did look there, but nothing obvious jumped at me. It's quite possible that I missed the "obvious", however. I'll look again. Thanks for the tip :)


No prob -- don't thank me till you find it! The function name "ShiftJIISTo[something]" rings a bell, so maybe a grep for that will help. I'm pretty sure it was Sony-written (complete with Japanese comments), so failing that you could try the ps2dev website or the SCE PS2 newsgroups (if they are still active).

Share this post


Link to post
Share on other sites
I need to go the other way, though - i.e. UTF8 to SJIS.

No worries - I ended up just coding it from scratch. I found a web page that gave the mapping, used a Perl script to massage that into a format I could use in my program and wrote a simple loop to find the codepoint of each UTF8 character and then find the corresponding SJIS character in the mapping. Seems to work just fine except for some odd quirks of the PS2 itself, which I also ended up having to work around. *shrug* So much for being lazy :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this