Sign in to follow this  

Loading unicode text files

This topic is 3857 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I am having a problem reading a .txt file that has unicode characters in them. Somehow they are read wrongly. The file looks like this: -1.0, -0.7, -0.5, -0.3, -0.1 0.2 Czech Nová hra Nastavení Nápověda .. and even more text. It is a language file that has some information about where to place it on the screen on top of the file. If I read it as a text file, the unicode is read wrongly. When I read the file in binary, the fscanf (pFile, "%f, %f, %f, %f, %f\n", &MenuPos[0], &MenuPos[1], &MenuPos[2], &MenuPos[3], &MenuPos[4]); for the first line fails. I actually don't know a way to read them in binary for I can't check for a spererator or a line end then. What can I do?

Share this post


Link to post
Share on other sites
You can read in binary data using fread().

If this is C++, there are wide-character equivalents to a lot of the standard functionality. ie: wcout, wchar_t, wstring.

Share this post


Link to post
Share on other sites
But I can't detect where a number ends, because it doens't stop reading at a blank character. Example:

20 30

can't be read with fread, for it doesn't stop at a blank character. Furthermore, I don't know the string lengths, so I can't give a range.

Share this post


Link to post
Share on other sites
Quote:
Original post by Revelation60
But I can't detect where a number ends, because it doens't stop reading at a blank character. Example:

20 30

can't be read with fread, for it doesn't stop at a blank character. Furthermore, I don't know the string lengths, so I can't give a range.


First, are you using C or C++? Second, if you're using C, then you just need to fread() one byte at a time, or read more than that and buffer it yourself. Or switch to C++ and use the wide character iostream stuff.

Share this post


Link to post
Share on other sites
Quote:
Original post by Revelation60
If I read it byte by byte, I think I get wrong results, because unicode has got more than one byte for a char.
Yes, you read byte by byte and then parse the results yourself.

Share this post


Link to post
Share on other sites
It would probably help if you knew which encoding your file uses. Read these:

http://en.wikipedia.org/wiki/Unicode
http://en.wikipedia.org/wiki/UTF-8

Once you (globally) understand what you are trying to do, I'm sure you can find a library that does it for you (though writing your own unicode-reading lib can be entertaining too).

Share this post


Link to post
Share on other sites
Quote:
Original post by Revelation60
When I read the file in binary, the

fscanf (pFile, "%f, %f, %f, %f, %f\n", &MenuPos[0], &MenuPos[1], &MenuPos[2], &MenuPos[3], &MenuPos[4]);

for the first line fails. I actually don't know a way to read them in binary for I can't check for a spererator or a line end then.

What can I do?

Use fwscanf.

Share this post


Link to post
Share on other sites

This topic is 3857 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this