Sign in to follow this  
Angelic Ice

Writing String to a Binary-file

Recommended Posts

Hello forum!

I wanted to convert a string of a Lua-table to a binary file.

std::string content = "some lua table";
std::ofstream file("file.bin", std::ios::out | std::ios::binary);
file.write(content.c_str(), content.size());

Is what I do (removed valid-file-checking).

But when I serialise/use the following raw-string:

test = {
  value = 24
}

the file says:

test = {
  value = 24
}

after writing with the code on top.

Reading this http://en.cppreference.com/w/cpp/io/c#Binary_and_text_modes, I'm not sure if that really happened.

On paper, I understand how I can calculate a binary of some encoded character. But simply converting my string into 0s and 1s won't solve my problem, as they are still just part of some character set, right?

What I expected: Some unreadable binary-file when opening it with some text-editor.

I'm not trying to "hide" information, but I often heard to save my file as binary and this is somehow confusing me with what I learned about binary-encoding.

 

I would be really happy if someone could help me out, thanks a lot for your time :)! Have a sweet weekend!

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

What I expected: Some unreadable binary-file when opening it with some text-editor.

What you're getting is the expected behaviour. You copied ASCII characters in binary format to disk, and then asked a text editor to visualize them.

If you use a memory debugger to inspect the raw binary data in RAM at the address returned by content.c_str(), you will see:

74 65 73 74 20 3D 20 7B 0D 0A 20 20 76 61 6C 75 65 20 3D 20 32 34 0D 0A 7D

^^if your debugger visualizes RAM as a series of bytes in hex visualisation...
...or...

test = {
  value = 24
}

^^if your debugger visualizes RAM as a series of bytes in ASCII encoding.
Either way it's the same binary data, just being visualized differently.
When you dump this section of memory to disk and then use a text-editor to visualize it, the text-editor is going to interpret it as a series of bytes in ASCII encoding and display those characters.

Share this post


Link to post
Share on other sites

Oh, and how come most savefiles are not readable? Are they shifting or setting anything offset?

But in the end, I could just leave this as it is and get all the benefits, right?

Edit: Thanks you two, I understand everything now : )!!! Have a great weekend!

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

Oh, and how come most savefiles are not readable?

Because they write integers/etc to disk instead of strings/text. 
e.g. try running this code and opening the result in a text editor:

int value = 24;
file.write((char*)&value, sizeof(value));

and compare it with this:

char byte1 = '2';
char byte2 = '4';
file.write(&byte1, 1);
file.write(&byte2, 1);

or this:

int value = 24;
char buffer[42];
itoa(value, buffer, 10);
file.write(buffer, strlen(buffer));

Share this post


Link to post
Share on other sites

Oh, and how come most savefiles are not readable? Are they shifting or setting anything offset?
A byte can have values 0 to 255 (inclusive). Plain ASCII runs from 0 too 127 (inclusive), where the printable characters start at 32. See a random ASCII table at the internet for all the numbers.

Bytes 128 to 255 is high ascii, where code -pages are in windows (I think, I don't know Windows precisely.) These typically contain all the 'weird' characters of other languages.

Depending on your editor, the high ascii may or may not be shown as characters. The characters 0 to 31 are control characters, editors normally assume they don't exist in a text-file (except for \n (value 10), \r (value 13), and \t (value 9)), and act weird when other control characters occur.

Most savegames try to reduce diskspace (or network bandwidth), and compress the data. Compressing has the effect that all bytes of 0 to 255 get used. (You want as much information as possible in a as small as possible space, so it's silly to use bytes with a range 0..255, and store only values 32..127 in it.)

 

But in the end, I could just leave this as it is and get all the benefits, right?
Saving your data as a readable text-file has the big advantage that you can read or edit it, while debugging problems. The only disadvantage is that the file is bigger. Also, others can easily edit the file instead of using your program (hmm "money = 34556" I see here, let's change that to "money = 345560000"  :p )  Depending on what the file contains or who has access to the file that is or is not a problem.

Share this post


Link to post
Share on other sites

But in the end, I could just leave this as it is and get all the benefits, right?

No.

binary files are faster to read and write. reading or writing a binary file its essentially a memcpy from disk to ram, or from ram to disk.

reading and writing text files performs translation of things like tabs, carriage returns, and line feeds. parsing the i/o stream to perform these translations means text files take longer to load and save.   

Also, text files write out everything as a string, so all non-string variables must be converted, which slows down loads and saves. its also more work to code, as all non-strings require both a conversion, and a read / write.

 

put the two together, and text files can be very slow compared to binary.

Share this post


Link to post
Share on other sites

Thanks again for the in-depth analysis : )
 

 

 

binary files are faster to read and write. reading or writing a binary file its essentially a memcpy from disk to ram, or from ram to disk.

But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )


This question has been answered already. ALL files are "binary", however, some bytes have special meaning to certain applications. A text editor, by design, converts all bytes to their ASCII equivalent. If you write an ASCII string to a file, no matter if that file is text or binary, and open it in a text editor, your string will be displayed as text. There is no way around this. If you write a string, the string will be displayable in a text editor, so long as you are not obfuscating it in some fashion.

From the perspective of the file I/O API (standard library, in your case), the only difference between a text file and binary file is how it writes and reads non-string data, i.e., integers, floats, etc. Edited by MarkS

Share this post


Link to post
Share on other sites
But isn't what the example creates a binary file? That is what I meant is by leaving it as it is : )

 

A "binary file" colloquially means it isn't designed to be read in a text editor.  "Storing textual data in a binary format" usually means compressing it, or otherwise encoding it in some other sparse way that your application can understand.  Reading compressed data can be a huge performance win depending on what kind of data you're dealing with.  In both cases, your binary mapping is smaller on disk than maintaing the raw text.

 

This is also why you see so many games with specialized file formats for their assets.  They're compressed and laid out for faster loading.

 

EDIT:  I went ahead and rewrote this to avoid further bikeshedding nonsense.

Edited by SeraphLance

Share this post


Link to post
Share on other sites

Storing textual data in a binary format
You cannot store it in any other way. ALL data is stored as binary bytes in a file. Text is stored as bytes, output of a compression program or an encryption program is stored as bytes. It is purely a matter of the receiving application understanding the received data.

 

You can compare it with letters on paper. Everyone that writes a message on paper does that by writing letters (or in eastern languages, words using elaborate symbols) on the paper. There is no textual or binary format for letters on paper, it's all letters on paper, no matter what language or coding system or whatever you use.

For simplicity, let's assume you may only write letters from the English alphabet. I can read sequences of letters in the English language. If you give me a paper that has letters using the rules of the English language, I can read and  understand what it says. If you write random letters, or use rules of another language (and still use the same alphabet), it's still letters on paper, the only difference is that I cannot understand what it says.

This understanding or lack thereof, is what is what many people see as "textual" versus "binary". Like there is special paper with English language on it, and other paper for letters with other rules. In reality, the paper is not different, it's exactly the same paper. Only the sequence of letters fails to match my rule-book for reading English text.

It's not "binary", it's just not understandable if you try to interpret it as text.

Share this post


Link to post
Share on other sites

Oh, but what should I do if writing- and loading-speed is all I care for?

 

My advice is just to not worry about it.  During development you'll want to be able to load those scripts as text files.  Only on release, and only if you see that the text files are an actual performance bottleneck, should you consider packing them in some binary format.

 

At my last job we had a specialized archive format (which was basically just a fancy zip) that we used for release builds, but most of our development was either done on source assets or on cached archives depending on what sort of work you did at the company.  Keep in mind this is a studio with a ~13MLOC codebase and a couple terabytes of versioned assets.  You likely don't need to worry about that scale.

Share this post


Link to post
Share on other sites
Oh, but what should I do if writing- and loading-speed is all I care for?

buffered _fwrite_nolock().

copy all data to a buffer. open file. _fwrite_nolock(file_ptr, buffer_ptr, size_of_buffer). close file.   that's the basic idea. check the syntax. 

but don't worry about it until you can notice a slowdown from using text files.  unless the amount of data saved is trivial,  odds are you'll want binary by the time its all said and done. In a big game like Skyrim, its the difference between 3 second and 30 second load times.

Edited by Norman Barrows

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this