# Unicode text files

This topic is 4802 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I know this is a fairly open ended question, but I'm looking for a little more direction than the MSVC help files are able to lend. I'm using MSVC7.0(2002) and I simply wish to open a unicode text document and read it's contents into a wchar_t string, does anyone know a quick, simple way to do this, I can expand on it later, I'm just trying to read it in at the moment. Thanks.

##### Share on other sites
Why kind of unicode encoding? UTF-8 or UTF-16 (or oddball UTF-32)? UTF-16 shouldn't be an issue, you can pretty much do a binary copy into memory. UTF-8 is more complex, especially if you follow the security recommendations.

UTF-8, sorry :(

##### Share on other sites
http://www.invenietis.net/Native/UTF8Encoder.htm

There is also a link to a decoder on that page, both a fast version and a slower, secure version.

##### Share on other sites
There must be a simpler way... This is great, but I'm not that great a coder, it must be able to be done with standard C++ functions... Can anyone help... I'm currently doing this:

	FILE *textf = NULL;	textf = _wfopen(L"mytext.txt", L"r");	wchar_t *mystring = new wchar_t[6];	fgetws(mystring, 5, textf);

However, what I am given back will not display correctly... Can anyone see anything wrong here, or is it likely to be in my text display code?

##### Share on other sites
Sorry, but there's really not a simpler way. If you want to go from UTF-8 (on disk) to UTF-16 (Windows wchar_t) or UTF-32 (Linux wchar_t), you've got to do the decoding yourself.

##### Share on other sites
What about reading UTF-16 straight from the disk? Is this possible using my simple method above?

##### Share on other sites
Yes, if you write in UTF-16, you should be able to read UTF-16 using your method, with the possible caveat of having to do endianess conversions if your files are coming from a machine with a different byte order.

Also, Windows uses a special indicator at the beginning of the file to indicate it is a Unicode text file: see here

##### Share on other sites
Even after saving my text file as UTF-16 (big or little endian) I only get junk back from the fgetws call... Microsoft use a resource to load a unicode text file in their Direct3D text example... but this seems like overkill, I should be able to do it with simple file i/o right?

1. 1
2. 2
frob
15
3. 3
4. 4
5. 5

• 14
• 13
• 14
• 69
• 15
• ### Forum Statistics

• Total Topics
632138
• Total Posts
3004319

×