Unicode text files

Started by
7 comments, last by shalrath 18 years, 10 months ago
I know this is a fairly open ended question, but I'm looking for a little more direction than the MSVC help files are able to lend. I'm using MSVC7.0(2002) and I simply wish to open a unicode text document and read it's contents into a wchar_t string, does anyone know a quick, simple way to do this, I can expand on it later, I'm just trying to read it in at the moment. Thanks.
Advertisement
Why kind of unicode encoding? UTF-8 or UTF-16 (or oddball UTF-32)? UTF-16 shouldn't be an issue, you can pretty much do a binary copy into memory. UTF-8 is more complex, especially if you follow the security recommendations.
UTF-8, sorry :(
http://www.invenietis.net/Native/UTF8Encoder.htm

There is also a link to a decoder on that page, both a fast version and a slower, secure version.
There must be a simpler way... This is great, but I'm not that great a coder, it must be able to be done with standard C++ functions... Can anyone help... I'm currently doing this:

	FILE *textf = NULL;	textf = _wfopen(L"mytext.txt", L"r");	wchar_t *mystring = new wchar_t[6];	fgetws(mystring, 5, textf);


However, what I am given back will not display correctly... Can anyone see anything wrong here, or is it likely to be in my text display code?
Sorry, but there's really not a simpler way. If you want to go from UTF-8 (on disk) to UTF-16 (Windows wchar_t) or UTF-32 (Linux wchar_t), you've got to do the decoding yourself.
What about reading UTF-16 straight from the disk? Is this possible using my simple method above?
Yes, if you write in UTF-16, you should be able to read UTF-16 using your method, with the possible caveat of having to do endianess conversions if your files are coming from a machine with a different byte order.

Also, Windows uses a special indicator at the beginning of the file to indicate it is a Unicode text file: see here
Even after saving my text file as UTF-16 (big or little endian) I only get junk back from the fgetws call... Microsoft use a resource to load a unicode text file in their Direct3D text example... but this seems like overkill, I should be able to do it with simple file i/o right?

This topic is closed to new replies.

Advertisement