• Advertisement
Sign in to follow this  

Writing String to a Binary-file

This topic is 384 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello forum!

I wanted to convert a string of a Lua-table to a binary file.

std::string content = "some lua table";
std::ofstream file("file.bin", std::ios::out | std::ios::binary);
file.write(content.c_str(), content.size());

Is what I do (removed valid-file-checking).

But when I serialise/use the following raw-string:

test = {
  value = 24
}

the file says:

test = {
  value = 24
}

after writing with the code on top.

Reading this http://en.cppreference.com/w/cpp/io/c#Binary_and_text_modes, I'm not sure if that really happened.

On paper, I understand how I can calculate a binary of some encoded character. But simply converting my string into 0s and 1s won't solve my problem, as they are still just part of some character set, right?

What I expected: Some unreadable binary-file when opening it with some text-editor.

I'm not trying to "hide" information, but I often heard to save my file as binary and this is somehow confusing me with what I learned about binary-encoding.

 

I would be really happy if someone could help me out, thanks a lot for your time :)! Have a sweet weekend!

Edited by Angelic Ice

Share this post


Link to post
Share on other sites
Advertisement

What I expected: Some unreadable binary-file when opening it with some text-editor.

What you're getting is the expected behaviour. You copied ASCII characters in binary format to disk, and then asked a text editor to visualize them.

If you use a memory debugger to inspect the raw binary data in RAM at the address returned by content.c_str(), you will see:

74 65 73 74 20 3D 20 7B 0D 0A 20 20 76 61 6C 75 65 20 3D 20 32 34 0D 0A 7D

^^if your debugger visualizes RAM as a series of bytes in hex visualisation...
...or...

test = {
  value = 24
}

^^if your debugger visualizes RAM as a series of bytes in ASCII encoding.
Either way it's the same binary data, just being visualized differently.
When you dump this section of memory to disk and then use a text-editor to visualize it, the text-editor is going to interpret it as a series of bytes in ASCII encoding and display those characters.

Share this post


Link to post
Share on other sites

Oh, and how come most savefiles are not readable? Are they shifting or setting anything offset?

But in the end, I could just leave this as it is and get all the benefits, right?

Edit: Thanks you two, I understand everything now : )!!! Have a great weekend!

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

Oh, and how come most savefiles are not readable?

Because they write integers/etc to disk instead of strings/text. 
e.g. try running this code and opening the result in a text editor:

int value = 24;
file.write((char*)&value, sizeof(value));

and compare it with this:

char byte1 = '2';
char byte2 = '4';
file.write(&byte1, 1);
file.write(&byte2, 1);

or this:

int value = 24;
char buffer[42];
itoa(value, buffer, 10);
file.write(buffer, strlen(buffer));

Share this post


Link to post
Share on other sites

Oh, and how come most savefiles are not readable? Are they shifting or setting anything offset?
A byte can have values 0 to 255 (inclusive). Plain ASCII runs from 0 too 127 (inclusive), where the printable characters start at 32. See a random ASCII table at the internet for all the numbers.

Bytes 128 to 255 is high ascii, where code -pages are in windows (I think, I don't know Windows precisely.) These typically contain all the 'weird' characters of other languages.

Depending on your editor, the high ascii may or may not be shown as characters. The characters 0 to 31 are control characters, editors normally assume they don't exist in a text-file (except for \n (value 10), \r (value 13), and \t (value 9)), and act weird when other control characters occur.

Most savegames try to reduce diskspace (or network bandwidth), and compress the data. Compressing has the effect that all bytes of 0 to 255 get used. (You want as much information as possible in a as small as possible space, so it's silly to use bytes with a range 0..255, and store only values 32..127 in it.)

 

But in the end, I could just leave this as it is and get all the benefits, right?
Saving your data as a readable text-file has the big advantage that you can read or edit it, while debugging problems. The only disadvantage is that the file is bigger. Also, others can easily edit the file instead of using your program (hmm "money = 34556" I see here, let's change that to "money = 345560000"  :p )  Depending on what the file contains or who has access to the file that is or is not a problem.

Share this post


Link to post
Share on other sites

But in the end, I could just leave this as it is and get all the benefits, right?

No.

binary files are faster to read and write. reading or writing a binary file its essentially a memcpy from disk to ram, or from ram to disk.

reading and writing text files performs translation of things like tabs, carriage returns, and line feeds. parsing the i/o stream to perform these translations means text files take longer to load and save.   

Also, text files write out everything as a string, so all non-string variables must be converted, which slows down loads and saves. its also more work to code, as all non-strings require both a conversion, and a read / write.

 

put the two together, and text files can be very slow compared to binary.

Share this post


Link to post
Share on other sites

Thanks again for the in-depth analysis : )
 

 

 

binary files are faster to read and write. reading or writing a binary file its essentially a memcpy from disk to ram, or from ram to disk.

But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )


This question has been answered already. ALL files are "binary", however, some bytes have special meaning to certain applications. A text editor, by design, converts all bytes to their ASCII equivalent. If you write an ASCII string to a file, no matter if that file is text or binary, and open it in a text editor, your string will be displayed as text. There is no way around this. If you write a string, the string will be displayable in a text editor, so long as you are not obfuscating it in some fashion.

From the perspective of the file I/O API (standard library, in your case), the only difference between a text file and binary file is how it writes and reads non-string data, i.e., integers, floats, etc. Edited by MarkS

Share this post


Link to post
Share on other sites
But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )

 

A "binary file" colloquially means it isn't designed to be read in a text editor.  "Storing textual data in a binary format" usually means compressing it, or otherwise encoding it in some other sparse way that your application can understand.  Reading compressed data can be a huge performance win depending on what kind of data you're dealing with.  In both cases, your binary mapping is smaller on disk than maintaing the raw text.

 

This is also why you see so many games with specialized file formats for their assets.  They're compressed and laid out for faster loading.

 

EDIT:  I went ahead and rewrote this to avoid further bikeshedding nonsense.

Edited by SeraphLance

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement