• Advertisement
Sign in to follow this  

C++ binary file IO

This topic is 3973 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying to write some code using IO with binary files in C++, but I'm having trouble doing very basic operations. I wrote the code below just as a test, to copy one file ("from") to another file ("to"). This seems like it should be a simple thing to do. ifstream from; ofstream to; from.open("from"); to.open("to"); char c; while(!from.eof()) { from.read(&c, 1); to.write(&c, 1); } from.close(); to.close(); Copying 1 byte at a time is probably not the most efficient way to do this, but this is just a test. The code above compiles and doesn't give any runtime errors, but it doesn't copy files correctly. The from.eof() call always seems to return true much too soon, so not all of the file is copied. I can't discern any pattern in when eof() returns true, but for any given file it always seems to happen in the same place. I don't want to attempt anything more complex until I get this working. Someone please help. Why does this code not correctly copy files?

Share this post


Link to post
Share on other sites
Advertisement
Don't forget the ios::binary flag when opening a file if you intend to use non-formatted input or output.

Share this post


Link to post
Share on other sites
First of all, the default mode for files is text, which can be problematic if the file contains certain values. To open a file in binary mode, specify this in the open call:

ifstream stream;
stream.open("somefile",std::ios::binary);

Better still is just do this in the constructor.

ifstream stream("somefile",std::ios::binary);

Even with this however, you will still write an extra character to the file, as the end of file marker isn't set until after a read. So you cannot write without testing whether the end of file has been reached.

You can see an example of how to read a file in one go here, which is better than writing something like this:

while(true) {
from.read(&c,1);
if( from.eof() ) break;
to.write(...);
}

Share this post


Link to post
Share on other sites
I recently had to do something like this for my pack file system (stuffs a bunch of files into one).

I admit that my code isn't the cleanest but this part is only used in the tool to construct the pack file. After that I have no need for it. Here it is:

// the InputStream and OutputStream are templated to allow use of different stream types, these types must support binary operations
// copyBuffer should be a contiguous block of bufferSize bytes, by not defining this inside the function body I am able to reuse the same buffer for multiple calls to CopyFile, typical bufferSize was 4096 bytes
template <typename InputStream,typename OutputStream>
// iol::Read() and iol::Write() are templated functions that read/write from/to a stream into the given object
// size is the size of the file calculated by doing:
// stream.seekg(std::ios::end);
// size = stream.tellg();
// stream.seekg(0);
// uint32 is a 32bit unsigned integer type
void CopyFile(uint32 size,InputStream &is,OutputStream &os,
char *copyBuffer,uint32 bufferSize)
{
while (size)
{
// if space remaining is less than buffer, copy chunk as whole
if (size>=bufferSize)
{
is.read(copyBuffer,bufferSize); // read into buffer
os.write(copyBuffer,bufferSize); // write from buffer
size-=bufferSize; // subtract read bytes
}
else
{ // copy 1 byte at a time
// read bytes
uint32 i=0;
for (i;i<size;i++)
iol::Read(is,copyBuffer);
// write bytes
i=0;
for (i;i<size;i++)
iol::Write(os,copyBuffer);
size=0; // end loop, all data has been copied
}
}
};


Share this post


Link to post
Share on other sites
You can significantly reduce the code size and slightly increase efficiency by using the following instead for the body of the function:
while (size)
{
const uint32 transferSize = min(size, bufferSize);
is.read(copyBuffer,transferSize);
os.write(copyBuffer,transferSize);
size-=transferSize;
}

Share this post


Link to post
Share on other sites
File operations are one of the few areas where I would give the advantage to the the C library. It feels like jumping through hoops to read files.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
You can significantly reduce the code size and slightly increase efficiency by using the following instead for the body of the function:
while (size)
{
const uint32 transferSize = min(size, bufferSize);
is.read(copyBuffer,transferSize);
os.write(copyBuffer,transferSize);
size-=transferSize;
}


If performance matters, it becomes necessary to rely on OS specific mechanisms.

In general, maximizing the buffer size while copying is a way to improve raw throughput performance, the bottlenecks with file IO often lie elsewhere - be it fragmentation, location of data on the media, or simply the OS mechanisms.

While raw throughput cannot be improved by much, OS specific aproach can improve the overall performance, usually by providing asynhronous access. Windows has quite decent support for it via overlapped IO.

Just something to keep in mind.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
You can significantly reduce the code size and slightly increase efficiency by using the following instead for the body of the function:
while (size)
{
const uint32 transferSize = min(size, bufferSize);
is.read(copyBuffer,transferSize);
os.write(copyBuffer,transferSize);
size-=transferSize;
}

Dang you and your brilliance. I'm not used to using std::min but now I will do so more often.

Quote:
Original post by Antheus
If performance matters, it becomes necessary to rely on OS specific mechanisms.

True, although I'm certain there is no OS mechanism to copy a file into an open stream.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
[...]If performance matters, it becomes necessary to rely on OS specific mechanisms.[...]
You can improve performance much more by using OS-specific mechanisms, but simply switching from reading 1 byte at a time (as the code would previously do when size < bufferSize) to reading N bytes at a time can offer a sihnigicant performance increase.

In some code I was working on, I had a loop that looped 1280 times and read a byte then processed it. I changed it to use a 1280-byte buffer and do a single read before the loop and performance increased at least 2-3 times (I don't remeber the actual value, but it was very significant). That tiny change shifted the read code from being the bottleneck to the actual calculations taking more time.
Quote:
[...]In general, maximizing the buffer size while copying is a way to improve raw throughput performance, the bottlenecks with file IO often lie elsewhere - be it fragmentation, location of data on the media, or simply the OS mechanisms.[...]
Of course, which is exactly what my code does. The previous code buffers properly for "size >= bufferSize" but then falls to reading a single byte at a time otherwise.
Quote:
[...]While raw throughput cannot be improved by much, OS specific aproach can improve the overall performance, usually by providing asynhronous access. Windows has quite decent support for it via overlapped IO.[...]
Most definitely true, but then again changing a program to properly use asynchronous I/O can involve quite sigifnicant changes to the code in many places.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement