Sign in to follow this  
Hirocon

C++ binary file IO

Recommended Posts

I'm trying to write some code using IO with binary files in C++, but I'm having trouble doing very basic operations. I wrote the code below just as a test, to copy one file ("from") to another file ("to"). This seems like it should be a simple thing to do. ifstream from; ofstream to; from.open("from"); to.open("to"); char c; while(!from.eof()) { from.read(&c, 1); to.write(&c, 1); } from.close(); to.close(); Copying 1 byte at a time is probably not the most efficient way to do this, but this is just a test. The code above compiles and doesn't give any runtime errors, but it doesn't copy files correctly. The from.eof() call always seems to return true much too soon, so not all of the file is copied. I can't discern any pattern in when eof() returns true, but for any given file it always seems to happen in the same place. I don't want to attempt anything more complex until I get this working. Someone please help. Why does this code not correctly copy files?

Share this post


Link to post
Share on other sites
First of all, the default mode for files is text, which can be problematic if the file contains certain values. To open a file in binary mode, specify this in the open call:

ifstream stream;
stream.open("somefile",std::ios::binary);

Better still is just do this in the constructor.

ifstream stream("somefile",std::ios::binary);

Even with this however, you will still write an extra character to the file, as the end of file marker isn't set until after a read. So you cannot write without testing whether the end of file has been reached.

You can see an example of how to read a file in one go here, which is better than writing something like this:

while(true) {
from.read(&c,1);
if( from.eof() ) break;
to.write(...);
}

Share this post


Link to post
Share on other sites
I recently had to do something like this for my pack file system (stuffs a bunch of files into one).

I admit that my code isn't the cleanest but this part is only used in the tool to construct the pack file. After that I have no need for it. Here it is:

// the InputStream and OutputStream are templated to allow use of different stream types, these types must support binary operations
// copyBuffer should be a contiguous block of bufferSize bytes, by not defining this inside the function body I am able to reuse the same buffer for multiple calls to CopyFile, typical bufferSize was 4096 bytes
template <typename InputStream,typename OutputStream>
// iol::Read() and iol::Write() are templated functions that read/write from/to a stream into the given object
// size is the size of the file calculated by doing:
// stream.seekg(std::ios::end);
// size = stream.tellg();
// stream.seekg(0);
// uint32 is a 32bit unsigned integer type
void CopyFile(uint32 size,InputStream &is,OutputStream &os,
char *copyBuffer,uint32 bufferSize)
{
while (size)
{
// if space remaining is less than buffer, copy chunk as whole
if (size>=bufferSize)
{
is.read(copyBuffer,bufferSize); // read into buffer
os.write(copyBuffer,bufferSize); // write from buffer
size-=bufferSize; // subtract read bytes
}
else
{ // copy 1 byte at a time
// read bytes
uint32 i=0;
for (i;i<size;i++)
iol::Read(is,copyBuffer[i]);
// write bytes
i=0;
for (i;i<size;i++)
iol::Write(os,copyBuffer[i]);
size=0; // end loop, all data has been copied
}
}
};


Share this post


Link to post
Share on other sites
You can significantly reduce the code size and slightly increase efficiency by using the following instead for the body of the function:
while (size)
{
const uint32 transferSize = min(size, bufferSize);
is.read(copyBuffer,transferSize);
os.write(copyBuffer,transferSize);
size-=transferSize;
}

Share this post


Link to post
Share on other sites
File operations are one of the few areas where I would give the advantage to the the C library. It feels like jumping through hoops to read files.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
You can significantly reduce the code size and slightly increase efficiency by using the following instead for the body of the function:
while (size)
{
const uint32 transferSize = min(size, bufferSize);
is.read(copyBuffer,transferSize);
os.write(copyBuffer,transferSize);
size-=transferSize;
}


If performance matters, it becomes necessary to rely on OS specific mechanisms.

In general, maximizing the buffer size while copying is a way to improve raw throughput performance, the bottlenecks with file IO often lie elsewhere - be it fragmentation, location of data on the media, or simply the OS mechanisms.

While raw throughput cannot be improved by much, OS specific aproach can improve the overall performance, usually by providing asynhronous access. Windows has quite decent support for it via overlapped IO.

Just something to keep in mind.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
You can significantly reduce the code size and slightly increase efficiency by using the following instead for the body of the function:
while (size)
{
const uint32 transferSize = min(size, bufferSize);
is.read(copyBuffer,transferSize);
os.write(copyBuffer,transferSize);
size-=transferSize;
}

Dang you and your brilliance. I'm not used to using std::min but now I will do so more often.

Quote:
Original post by Antheus
If performance matters, it becomes necessary to rely on OS specific mechanisms.

True, although I'm certain there is no OS mechanism to copy a file into an open stream.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
[...]If performance matters, it becomes necessary to rely on OS specific mechanisms.[...]
You can improve performance much more by using OS-specific mechanisms, but simply switching from reading 1 byte at a time (as the code would previously do when size < bufferSize) to reading N bytes at a time can offer a sihnigicant performance increase.

In some code I was working on, I had a loop that looped 1280 times and read a byte then processed it. I changed it to use a 1280-byte buffer and do a single read before the loop and performance increased at least 2-3 times (I don't remeber the actual value, but it was very significant). That tiny change shifted the read code from being the bottleneck to the actual calculations taking more time.
Quote:
[...]In general, maximizing the buffer size while copying is a way to improve raw throughput performance, the bottlenecks with file IO often lie elsewhere - be it fragmentation, location of data on the media, or simply the OS mechanisms.[...]
Of course, which is exactly what my code does. The previous code buffers properly for "size >= bufferSize" but then falls to reading a single byte at a time otherwise.
Quote:
[...]While raw throughput cannot be improved by much, OS specific aproach can improve the overall performance, usually by providing asynhronous access. Windows has quite decent support for it via overlapped IO.[...]
Most definitely true, but then again changing a program to properly use asynchronous I/O can involve quite sigifnicant changes to the code in many places.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
You can significantly reduce the code size and slightly increase efficiency by using the following instead for the body of the function:
while (size)
{
const uint32 transferSize = min(size, bufferSize);
is.read(copyBuffer,transferSize);
os.write(copyBuffer,transferSize);
size-=transferSize;
}


You will be unlikely to beat the following C++ idom in terms of speed or elegance (assuming you are not using the Microsoft standard library in debug mode, which adds considerable overhead). I would argue it definitely beats and C code that could accomplish the same thing.

ifstream from("from", std::ios_base::binary);
ofstream to("to", std::ios_base::binary);

to << from.rdbuf();

Share this post


Link to post
Share on other sites
Quote:
Original post by Bregma
Quote:
Original post by Extrarius
You can significantly reduce the code size and slightly increase efficiency by using the following instead for the body of the function:
while (size)
{
const uint32 transferSize = min(size, bufferSize);
is.read(copyBuffer,transferSize);
os.write(copyBuffer,transferSize);
size-=transferSize;
}


You will be unlikely to beat the following C++ idom in terms of speed or elegance (assuming you are not using the Microsoft standard library in debug mode, which adds considerable overhead). I would argue it definitely beats and C code that could accomplish the same thing.

ifstream from("from", std::ios_base::binary);
ofstream to("to", std::ios_base::binary);

to << from.rdbuf();

Uhh... Are you sure that you want to use a stream operator on binary data?

Share this post


Link to post
Share on other sites
Quote:
Original post by T1Oracle
Uhh... Are you sure that you want to use a stream operator on binary data?


The data being binary does not prevent it from being represented by characters. operator<< applied on a streambuf is defined as copying over individual characters up until the end of the stream, which should conserve the properties of the data, assuming both streams are in binary mode.

Share this post


Link to post
Share on other sites
I've read somewhere that stream operators are only for text. When I was coding my pack file system the first solution I saw was "to << from.rdbuf();" but that was on text data. Nothing I found confirmed that stream operators can process binary without the risk of automatic character conversions.

Share this post


Link to post
Share on other sites
Quote:
Original post by T1Oracle
I've read somewhere that stream operators are only for text. When I was coding my pack file system the first solution I saw was "to << from.rdbuf();" but that was on text data. Nothing I found confirmed that stream operators can process binary without the risk of automatic character conversions.

The automatic character translation depends on how you open the file (using ios::binary or not). It have nothing to wo with whether you're using formatted or unformatted I/O. You can open it in text mode and have translation with the read and write functions, just as you can use the stream insertion and extraction operators on binary streams without character translations.

Share this post


Link to post
Share on other sites
Quote:
Original post by T1Oracle
I've read somewhere that stream operators are only for text. When I was coding my pack file system the first solution I saw was "to << from.rdbuf();" but that was on text data. Nothing I found confirmed that stream operators can process binary without the risk of automatic character conversions.
Normally the insertion operators on an ostream convert data between internal and external representation, and the extraction operators on an istream convert from external to internal representation. The exact meaning of external and internal representation depend on the locale imbued in the iostream and the type of the object being inserted or extracted.

The definition of 'text' is vague at best, but I'm assuming you mean some sequence of bytes that has an external representation that could be interpreted as glyphs in a known human-readable language.

None of that has bearing on the insertion and extraction operators for a streambuf object. The line

to << from.rdbuf();
is simply transferring uninterpreted bytes from the streambuf object in from directly to the streambuf object in to. A streambuf performs I/O operations and buffering, but all data remains in external format. No interpretation is performed, there is no intermediate representation in any internal form. It's just a stream of bytes transferred from one I/O buffer to another, with underflow and overflow operations performed when necessary. There is no 'text' involved at any point. As long as both files are opened in binary mode, there is no intepretation of bytes as file control sequences either.

Share this post


Link to post
Share on other sites
I really like Memory mapped files.

If you're using windows then I'd recommend using them. They are fast and easy to use. You do have to know the size of memory you need in advance though.

http://msdn2.microsoft.com/en-us/library/ms810613.aspx

Share this post


Link to post
Share on other sites
Quote:
Original post by Bregma
Quote:
Original post by T1Oracle
I've read somewhere that stream operators are only for text. When I was coding my pack file system the first solution I saw was "to << from.rdbuf();" but that was on text data. Nothing I found confirmed that stream operators can process binary without the risk of automatic character conversions.
Normally the insertion operators on an ostream convert data between internal and external representation, and the extraction operators on an istream convert from external to internal representation. The exact meaning of external and internal representation depend on the locale imbued in the iostream and the type of the object being inserted or extracted.

The definition of 'text' is vague at best, but I'm assuming you mean some sequence of bytes that has an external representation that could be interpreted as glyphs in a known human-readable language.

None of that has bearing on the insertion and extraction operators for a streambuf object. The line

to << from.rdbuf();
is simply transferring uninterpreted bytes from the streambuf object in from directly to the streambuf object in to. A streambuf performs I/O operations and buffering, but all data remains in external format. No interpretation is performed, there is no intermediate representation in any internal form. It's just a stream of bytes transferred from one I/O buffer to another, with underflow and overflow operations performed when necessary. There is no 'text' involved at any point. As long as both files are opened in binary mode, there is no intepretation of bytes as file control sequences either.

Thank you, that clarified a lot for me. Ratings+

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this