I/O File new line problem

Started by
6 comments, last by Khatharr 11 years, 3 months ago

I'm working on a program to (en-)cipher files. With some experiments concerning I/O on Windows and Linux (later on Mac OS) I recognized a difference in file length. I heard that Windows uses 2 Bytes "\r\n" for new line while UNIX systems uses just 1 Byte "\n". I fear that my file (en-)ciphering program wouldn't be platform independent. Is there a way to avoid this?

Main problem: Person A uses Windows and cipher a file which he sends this file to Person B who uses UNIX.

lg

Advertisement

It depends on the text file encoding. Different OS will use different encodings. Many tools can detect the encoding automatically (e.g. ultraedit), e.g. by checking for special chars or combinations of it.

The easiest way is to request/process only files in a certain format (utf-8 etc.).

Does UTF-8 really determine this? Doesn't an encoding such as UTF-8 just specify how a character is laid out in memory? I guess a UTF-8 file can still contain one- or two-character newlines.

I might be wrong though...

It depends on the encoding, ok utf-8 was a bad example smile.png

You could try to use an encoding, like unicode 16, which support (hopefully) all neccessary newline-variations as single (16bit) chars.

Did you try opening your files in binary mode instead of text mode?

Well, the first thing to note is that your program isn't introducing a new problem. If the original file was sent from a Unix to Windows machine, the user would still have to deal with it. Many text editors will handle this gracefully. In addition, savvy Unix users will be aware of tools such as "dos2unix" and "unix2dos" if the program they want to read the file in is clueless.

Now, you know that solving this problem is optional. The simplest approach is to read/write file in binary mode, which should result in an identical file.

Of course, you might want to save the user the hassle I mentioned earlier. However, by choosing to do this you also have to deal with a much harder problem: determining if the file is a text file to be translated or a binary file that must be left "as is", or a text file that requires newline encoding. One way is to ask the user (or force them to provide the information e.g. a command line switch), which is great if your target audience is technical.

Did you try opening your files in binary mode instead of text mode?


It works. I opened a file in binary mode:


std::ofstream os("test.txt", std::ios::binary);
os << "\n";

and except for "\r\n" it wrote "\n" on Windows. This means I have to read and write in binary mode and the ciphered file can be read on any OS. Now I don't have to worry about this problem anymore. I forgot that std::ios::in/out opens files in text mode.

PS: My program is in it's final stage. You enter an integer key and the algorithm ciphers it. Only the correct key can decipher it. The only question is what shall happen if you need too many tries for entering the correct key.

PS: My program is in it's final stage. You enter an integer key and the algorithm ciphers it. Only the correct key can decipher it. The only question is what shall happen if you need too many tries for entering the correct key.

Fill the screen with penguins.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

This topic is closed to new replies.

Advertisement