# Unicode File IO woes...

This topic is 4348 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

This Unicode is really getting on my tits I'm a real C++ zealot, so I don't know a lot of the old C i/o functions. Just check this code out Chris:
tempString = new WCHAR[128];
currentElement.AttributeWChars("path", tempString);

FILE *m_file = _wfopen(tempString,L"r");

if(m_file)
{
fgetws(tempString, 128, m_file); // Temporary - skips first line of file.

while( !feof(m_file) )
{
tempString = new WCHAR[128];
fgetws(tempString, 128, m_file);
m_textStrings.push_back(tempString);
}

fclose(m_file);

}

And the Unicode file looks like this:
3
0 This is a test! #
1 Welcome To Go-Go-Golf! #
2 レポート、ブログにて掲載中 (2005/5/30). ●, ダンジョン シ #

Not sure if you'll see but '2' is followed by Japanese characters. Now, using C++ filehandling I can get these all in nicely but the Jap chars come in as '?'. tinyXML doesn't support Unicode so we have to do it ourselves - the chars come in as garbage. I found these C-style i/o functions which apparently open the file, return me 3 characters of junk the first time we see fgetws(tempString, 128, m_file); After that, every fgetws is just blank ( "" ). Any ideas? :/

##### Share on other sites
What character encoding is the "Unicode file" using?
MSDN says
Quote:
 fgetwc is the wide-character version of fgetc; it reads c as a multibyte character or a wide character according to whether stream is opened in text mode or binary mode.

So if you open the file in text mode (as in that example), it tries to read multibyte (UTF-8? it seems to depend on the locale) characters; if you open it in binary mode instead, it'll read wchar_t (UTF-16) instead. So it matters how your file is representing the Unicode characters as bytes.
(_wfopen just accepts Unicode filenames - the opened file is treated exactly the same as with fopen.)

##### Share on other sites
Ahhhh

Well, WordPad's "Unicode Text Document". I assume that's UTF-16.

How can I open it in binary mode?

EDIT: fgetws (after formatting) is returning:

0x00ee84f8 "2 ì0Ý0ü0È00Ö0í0°0k0f0²c-N (2005/5/30). Ï%, À0ó0¸0ç0ó0 ·0 " wchar_t *

...for the above Japanese string. Argh.

EDIT 2:

Fixed :) Opening the file in binary ( "rb" ) fixed it!

##### Share on other sites
Wordpad does seem to save as UTF-16, so you have to open it in binary mode with
FILE *m_file = _wfopen(tempString, L"rb");

When I do that, I get the correct "0x0012fde4 "2 レポート、ブログにて掲載中 (2005/5/30). ●, ダンジョン シ #" in the debugger.

(Incidentally, the calls to "new WCHAR[128]" will results in lots of memory leaks, since you're never delete[]ing the array. Something like "WCHAR tempString[128];" should work the same but without the leaking.)

Edit: Ah, just noticed your edit [smile]. By the way, the "Show Preview" in this window doesn't really like Unicode...

• 22
• 10
• 19
• 14
• 14