# ifstream & ofstream binary troubles

## Recommended Posts

Posted (edited)
Hi Guys,
I have a strange problem when using ifstream and ofstream.
I made my own asset packing format today, which seemed to be working well until I use complex data in my files.
The theory is as follows;
• Drop a bunch of files on to the .exe.
• Program asks for an unsigned int ID type (i.e. 100 might be a walk sequence).
• Program packs the individual files into one along with some additional data.
• The program then does a read back of the resultant file to make sure that everything packed ok.
This is working fine for any amount of text documents etc... But when I use more complex files. The program crashes. Doesn't matter if the files are exactly the same size as my test text files.
Does ifstream and/or ofstream have issues with complex input? Does it somehow misread some things (eg. \r \n \t \??)
I am reading and writing the files as binary, allocating memory blocks, and doing all of those sorts of things.
fsIn.open("output.txt", std::fstream::in | std::fstream::binary);
fsIn.seekg(0, std::ios::end);
int nSize = (int)fsIn.tellg();
char *memblock = new char[nSize];
fsIn.seekg(0, std::ios::beg);
fsIn.close();

This has got me stumped though.
Example of dropping two text files on to the exe.

Animation sequence packing module
Files: 2

Enter sequence number: 77
0: 12
1: 11304
File Size: 22596

Objects in file: 2
12
Offset: 12
11304
Offset: 11304
(Offset: 12)    ID: 77  Size: 11280
(Offset: 11304) ID: 77  Size: 11280

This is an expected result and working fine.

If I drop two binary files with (seemingly) random content the program doesn't function correctly. As per below;

Animation sequence packing module
Files: 2

Enter sequence number: 77
0: 12
1: 11304
File Size: 22644

Objects in file: 2
12
Offset: 12
11304
Offset: 11304
(Offset: 12) ID: 77 Size: 11280
(Offset: 11304) ID: 1092444195 Size: 1097630673

In this particular case the file sizes are identical to the txt files used in the previous example to rule out silly things like buffer overruns, etc.

But, the data is getting mangled at the second entry. ID: should be 77 as well and Size: should be 11280 as per the entry above.

So, this has me confused as to why this would happen, the files are being written and read in binary mode.

Probably somewhat confusing as to what I have posted here, so I am happy to share whatever you need, just point me in the right direction as to what info you are after.

I can drop any amount of text files I can find (even hundreds) and the program behaves as expected. But as soon as the data in the files becomes extremely complex the program falls over instantly (even throws 'not responding' prior to crashing)

Any help would be greatly appreciated :) Edited by DarkRonin

##### Share on other sites
Posted (edited)
This is working fine for any amount of text documents etc... But when I use more complex files. The program crashes

Whats the crash? At what line in your code does it crash? This information is vital for anyone understanding your problem, and maybe it will even help you resolve the issue by yourself.

EDIT: A wild quess: You are not interpreting the file content properly. For example, if you are searching for a terminal character at the end of the files content, for a textfile this character will not be in there, but a binary file obvisouly can contain any character possible, so it might only take a fraction of the files content and try to read additional meta-data out of the rest of the file. But again, without knowing whats exactly wrong, thats just total quesswork.

Edited by Juliean

##### Share on other sites

The code you posted is almost certainly not where the problem is. The problem is probably in whatever you're doing to pack assets.

If you have a file that you know crashes the program, then run the program in the debugger on that file. Find out exactly where the crash is.

##### Share on other sites
Thanks guys,

The format is pretty simple.

Example of how three files are packed

Total Items
OffsetX
OffsetY
OffsetZ
[OffsetX]
Type
ID
Size
[OffsetY]
Type
ID
Size
[OffsetZ]
Type
ID
Size

(everything here is unsigned ints (four bytes a piece), with the exception of the 'payloads'. They can be any size.

When reading back a file that has been packed with any amount of text files, the data gets interpreted correctly every single time, regardless of the size of the 'payload'.

This is what I don't understand. I can't get my head around why these interpret at 100% success, but when the text files are replaced by seemingly 'gibberish' (but purposeful) data files, the problem occurs.

I even tried to make the 'payload' unsigned chars just in case, but fstream seems to require chars.

I'll try and analyse the raw data files some more and see if I can detect where it throws things off. Not quite sure how I'll achieve that though when the data files average 12KB each.

##### Share on other sites
Posted (edited)
The format is pretty simple.

We weren't talking about the format of the file, but your code. You do interpret that file in via code, right?

I'll try and analyse the raw data files some more and see if I can detect where it throws things off. Not quite sure how I'll achieve that though when the data files average 12KB each.

Step through the code you use for reading in the file, and see at which exact point it breaks down? If we knew what type of crash you had, we could give more precise directions. For example, if you have a heap-corruption at seamingly random points, use a program like ApplicationVerifier, which can pinpoint you down to the point of the issue.  But since we only know "it crashes", without even seeing the specific code, thats kind of hard.

Edited by Juliean

##### Share on other sites
Yes, I create the file in code and then I re-open it to re-interpret it.

The fault lies because the ID and Data size are being misinterpreted (on the second offset), thus causing the data 'blob' to be over-read due to incorrect size (interpreted), causing an exception.

I just can't figure out at this stage why this data is misinterpreted in the first place (due to the input file contents).

Hard to explain. I know why it is failing, but don't know why it is only caused by input files that contain complex data (as oppose to basic text).

Really don't know what to do, or how to go about isolating beyond this point.

##### Share on other sites

It crashes because your code for handling the data is wrong. If you won't post your code, then the help we can give you is pretty much done. Find the smallest possible reproduction case and step through it until you see where the data is wrong. Then find out if it because wrong during reading or writing. Then fix whatever did that.

##### Share on other sites
Posted (edited)

More than happy to post the entire code. But I though that might be frowned upon (due to size).

I have also attached two files. One works perfectly when dropped on the exe and the other doesn't. Both are identical size.

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>

class Data
{
public:
Data()
{
}
~Data()
{
// Not here - will crash
//		if (memblock)
//			delete[] memblock;
// Yes - this leaks presently. will sort out when working
}

void DataCapture(char* data)
{
if (size)
{
memblock = new char[size];
memcpy(memblock, data, size);
}
}

{
if (size)
{
for (unsigned int i = 0; i < size; i++)
{
std::cout << memblock[i];
}
}
}

unsigned int id = 0;
unsigned int size = 0;
unsigned int offset = 0;
unsigned int type = 0;
char *memblock = 0;
};

int main(int argc, char *argv[])
{
unsigned int nFileCount = argc - 1;
unsigned int nType = 0;

std::ifstream fsIn;
std::ofstream fsOut;

// Welcome screen
std::cout << "Animation sequence packing module\r\n";
std::cout << "Files: " << nFileCount << "\r\n\r\n";

if (nFileCount == 0)
{
std::cout << "No input files specified\r\n";
system("PAUSE");
return 1;
}

std::cout << "Enter sequence number: ";
std::cin >> nType;

// Store individual file detail into a vector
std::vector<Data> dataVector;
for (unsigned int i = 0; i < nFileCount; i++)
{
fsIn.open(argv[i+1], std::fstream::in | std::fstream::binary);

fsIn.seekg(0, std::ios::end);
unsigned int nSize = (int)fsIn.tellg();
char *memblock = new char[nSize];
fsIn.seekg(0, std::ios::beg);
fsIn.close();

Data dataTemp;
dataTemp.type = nType;
dataTemp.id = i;
dataTemp.size = nSize;
dataTemp.DataCapture( memblock );
dataVector.push_back(dataTemp);
}

// Calculate offsets
int nObjects = nFileCount;
int nPreamble = sizeof(nObjects) + (nObjects * sizeof(nObjects));

int count = 0;
int prevOffset = 0;
int prevSize = 0;
for (std::vector<Data>::iterator it = dataVector.begin(); it != dataVector.end(); ++it)
{
int offset = 0;
if (count == 0)
offset = nPreamble;
else
offset = prevOffset + sizeof(unsigned int) + sizeof(unsigned int) + sizeof(unsigned int) + prevSize;

it->offset = offset;
prevOffset = it->offset;
prevSize = it->size;

std::cout << count << ": " << offset << "\r\n";//" (Raw data starts at " << offset + 16 << ")\r\n";// name16, id4, size4,
count++;
}

// Write file
fsOut.open("output.txt");

// Write number of objects (4 bytes)
fsOut.write(reinterpret_cast<const char *>(&nObjects), sizeof(nObjects));

// Write offsets (4 bytes each)
for (std::vector<Data>::iterator it = dataVector.begin(); it != dataVector.end(); ++it)
{
unsigned int offset = it->offset;
fsOut.write(reinterpret_cast<char *>(&offset), sizeof(unsigned int));
}

for (std::vector<Data>::iterator it = dataVector.begin(); it != dataVector.end(); ++it)
{
// Write type (4 bytes)
unsigned int nType = it->type;
fsOut.write(reinterpret_cast<char *>(&nType), sizeof(unsigned int));

// Write ID (4 bytes)
unsigned int nId = it->id;
fsOut.write(reinterpret_cast<char *>(&nId), sizeof(unsigned int));

// Write size (4 bytes)
unsigned int nSize = it->size;
fsOut.write(reinterpret_cast<char *>(&nSize), sizeof(unsigned int));

// Data blob (data size varies - nSize)
fsOut.write(it->memblock, nSize);

std::cout << it->type << "\r\n";
std::cout << it->id << "\r\n";
std::cout << it->size << "\r\n";

}

fsOut.close();

fsIn.open("output.txt", std::fstream::in | std::fstream::binary);
fsIn.seekg(0, std::ios::end);
int nSize = (int)fsIn.tellg();
char *memblock = new char[nSize];
fsIn.seekg(0, std::ios::beg);
fsIn.close();

std::cout << "File Size: " << nSize << "\r\n";

int newObjects = 0;
memcpy(&newObjects, memblock, sizeof(int));
std::cout << "Objects in file: " << newObjects << "\r\n";

std::vector<Data> dataVectorIn;
unsigned int items = newObjects;
unsigned int offsets = 0;

//int
nPreamble = sizeof(int) + (nObjects * sizeof(int));

// Pre-fill offsets (to be used in next phase)
for (int i = 0; i < newObjects;i++)
{
memcpy(&offsets, memblock + sizeof(unsigned int) + (i * sizeof(unsigned int)), sizeof(unsigned int));

Data dataTemp;
dataTemp.type = 0;
dataTemp.id = 0;
dataTemp.size = 0;
dataTemp.offset = (unsigned int)offsets;
//dataTemp.DataCapture(memblock);
dataVectorIn.push_back(dataTemp);

std::cout << "Offset: " << dataTemp.offset << "\r\n";
}

//OK TO HERE

// Read Type (int), ID (int), Size (int), Data (char)
for (std::vector<Data>::iterator it = dataVectorIn.begin(); it != dataVectorIn.end(); ++it)
{
int offset = it->offset;

unsigned int tempType = 0;
memcpy(&tempType, memblock + offset + (sizeof(unsigned int) * 0), sizeof(unsigned int));
it->type = tempType;

unsigned int tempId = 0;
memcpy(&tempId, memblock + offset + (sizeof(unsigned int) * 1), sizeof(unsigned int));
it->id = tempId;

unsigned int tempSize = 0;
memcpy(&tempSize, memblock + offset + (sizeof(unsigned int) * 2), sizeof(unsigned int));
it->size = tempSize;

// char *blob = new char[it->size];
// memcpy(blob, memblock + offset + (sizeof(unsigned int) * 3), it->size);
// it->memblock = blob;

std::cout << "Offset: " << it->offset << "\t";
std::cout << "Type: " << it->type << "\t";
std::cout << "ID: " << it->id << "\t";
std::cout << "Size: " << it->size << "\t";

/*
for (int j = 0; j < it->size; j++)
{
std::cout << it->memblock[j];
}
*/

std::cout << "\r\n";

// delete[] blob;
}

std::cout << sizeof(int);

delete[] memblock;

system("PAUSE");
return 0;
}


Obviously I'll be cleaning up any leaks and wrapping it into a class when the basic functionality is stable :)

Edited by DarkRonin

##### Share on other sites

There's a lot of stuff that you could improve in your code, the issue you're running into now is that you aren't writing the output as a binary file:

fsOut.open("output.txt");

Even though you're using the "write" member function, stream behaves differently since it's not binary. I tested it out and it completed successfully.

##### Share on other sites

Oh man something so simple (I thought it was going to be). It is indeed working now.

And yes, I do indeed have much to clean up and refactor. It was the core functionality that was driving me crazy.

Just tested with various amounts of files and it is working as intended now.

Thank you so much for your help. It is truly appreciated :)

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628301
• Total Posts
2981913

• 10
• 11
• 11
• 10
• 10