Writing String to a Binary-file

Started by
13 comments, last by Melley 7 years ago

Hello forum!

I wanted to convert a string of a Lua-table to a binary file.


std::string content = "some lua table";
std::ofstream file("file.bin", std::ios::out | std::ios::binary);
file.write(content.c_str(), content.size());

Is what I do (removed valid-file-checking).

But when I serialise/use the following raw-string:


test = {
  value = 24
}

the file says:


test = {
  value = 24
}

after writing with the code on top.

Reading this http://en.cppreference.com/w/cpp/io/c#Binary_and_text_modes, I'm not sure if that really happened.

On paper, I understand how I can calculate a binary of some encoded character. But simply converting my string into 0s and 1s won't solve my problem, as they are still just part of some character set, right?

What I expected: Some unreadable binary-file when opening it with some text-editor.

I'm not trying to "hide" information, but I often heard to save my file as binary and this is somehow confusing me with what I learned about binary-encoding.

I would be really happy if someone could help me out, thanks a lot for your time :)! Have a sweet weekend!

Advertisement

What I expected: Some unreadable binary-file when opening it with some text-editor.

What you're getting is the expected behaviour. You copied ASCII characters in binary format to disk, and then asked a text editor to visualize them.

If you use a memory debugger to inspect the raw binary data in RAM at the address returned by content.c_str(), you will see:


74 65 73 74 20 3D 20 7B 0D 0A 20 20 76 61 6C 75 65 20 3D 20 32 34 0D 0A 7D

^^if your debugger visualizes RAM as a series of bytes in hex visualisation...
...or...


test = {
  value = 24
}

^^if your debugger visualizes RAM as a series of bytes in ASCII encoding.
Either way it's the same binary data, just being visualized differently.
When you dump this section of memory to disk and then use a text-editor to visualize it, the text-editor is going to interpret it as a series of bytes in ASCII encoding and display those characters.

What I expected: Some unreadable binary-file when opening it with some text-editor.
There are bytes in your file, but text-files are encoded in bytes too, in a file. So if you write bytes that match characters of text, you get of course that character again when you read it again, eg in a text editor. (Your text editor reads the bytes, sees that it matches known characters, and display them as text, so you can read it :) )

"Unreadable binary file" isn't unreadable, it just contains bytes that your text editor doesn't understand or expect. Binary and text-mode of C/C++ has nothing to do with it. That flag only toggles whether the end-of-line character gets rewritten. In windows, a end-of-line between 2 lines of text is stored as \r\n in a file. C/C++ say it must become a \n. When you read a file in text mode, a \r\n sequence gets rewritten to \n, and when you write to a text-file and you write a \n, it becomes \r\n in the file.

The binary mode of C/C++ file io disables the above conversion, \n stays \n, and \r\n stays \r\n while reading or writing.

Oh, and how come most savefiles are not readable? Are they shifting or setting anything offset?

But in the end, I could just leave this as it is and get all the benefits, right?

Edit: Thanks you two, I understand everything now : )!!! Have a great weekend!

Oh, and how come most savefiles are not readable?

Because they write integers/etc to disk instead of strings/text.
e.g. try running this code and opening the result in a text editor:


int value = 24;
file.write((char*)&value, sizeof(value));

and compare it with this:


char byte1 = '2';
char byte2 = '4';
file.write(&byte1, 1);
file.write(&byte2, 1);

or this:


int value = 24;
char buffer[42];
itoa(value, buffer, 10);
file.write(buffer, strlen(buffer));

Oh, and how come most savefiles are not readable? Are they shifting or setting anything offset?
A byte can have values 0 to 255 (inclusive). Plain ASCII runs from 0 too 127 (inclusive), where the printable characters start at 32. See a random ASCII table at the internet for all the numbers.

Bytes 128 to 255 is high ascii, where code -pages are in windows (I think, I don't know Windows precisely.) These typically contain all the 'weird' characters of other languages.

Depending on your editor, the high ascii may or may not be shown as characters. The characters 0 to 31 are control characters, editors normally assume they don't exist in a text-file (except for \n (value 10), \r (value 13), and \t (value 9)), and act weird when other control characters occur.

Most savegames try to reduce diskspace (or network bandwidth), and compress the data. Compressing has the effect that all bytes of 0 to 255 get used. (You want as much information as possible in a as small as possible space, so it's silly to use bytes with a range 0..255, and store only values 32..127 in it.)

But in the end, I could just leave this as it is and get all the benefits, right?
Saving your data as a readable text-file has the big advantage that you can read or edit it, while debugging problems. The only disadvantage is that the file is bigger. Also, others can easily edit the file instead of using your program (hmm "money = 34556" I see here, let's change that to "money = 345560000" :p ) Depending on what the file contains or who has access to the file that is or is not a problem.

But in the end, I could just leave this as it is and get all the benefits, right?

No.

binary files are faster to read and write. reading or writing a binary file its essentially a memcpy from disk to ram, or from ram to disk.

reading and writing text files performs translation of things like tabs, carriage returns, and line feeds. parsing the i/o stream to perform these translations means text files take longer to load and save.

Also, text files write out everything as a string, so all non-string variables must be converted, which slows down loads and saves. its also more work to code, as all non-strings require both a conversion, and a read / write.

put the two together, and text files can be very slow compared to binary.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Thanks again for the in-depth analysis : )

binary files are faster to read and write. reading or writing a binary file its essentially a memcpy from disk to ram, or from ram to disk.

But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )

But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )


This question has been answered already. ALL files are "binary", however, some bytes have special meaning to certain applications. A text editor, by design, converts all bytes to their ASCII equivalent. If you write an ASCII string to a file, no matter if that file is text or binary, and open it in a text editor, your string will be displayed as text. There is no way around this. If you write a string, the string will be displayable in a text editor, so long as you are not obfuscating it in some fashion.

From the perspective of the file I/O API (standard library, in your case), the only difference between a text file and binary file is how it writes and reads non-string data, i.e., integers, floats, etc.
But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )

A "binary file" colloquially means it isn't designed to be read in a text editor. "Storing textual data in a binary format" usually means compressing it, or otherwise encoding it in some other sparse way that your application can understand. Reading compressed data can be a huge performance win depending on what kind of data you're dealing with. In both cases, your binary mapping is smaller on disk than maintaing the raw text.

This is also why you see so many games with specialized file formats for their assets. They're compressed and laid out for faster loading.

EDIT: I went ahead and rewrote this to avoid further bikeshedding nonsense.

This topic is closed to new replies.

Advertisement