Sign in to follow this  
lonewolff

ifstream & ofstream binary troubles

Recommended Posts

Hi Guys,
I have a strange problem when using ifstream and ofstream.
I made my own asset packing format today, which seemed to be working well until I use complex data in my files.
The theory is as follows;
  • Drop a bunch of files on to the .exe.
  • Program asks for an unsigned int ID type (i.e. 100 might be a walk sequence).
  • Program packs the individual files into one along with some additional data.
  • The program then does a read back of the resultant file to make sure that everything packed ok.
This is working fine for any amount of text documents etc... But when I use more complex files. The program crashes. Doesn't matter if the files are exactly the same size as my test text files.
Does ifstream and/or ofstream have issues with complex input? Does it somehow misread some things (eg. \r \n \t \??)
I am reading and writing the files as binary, allocating memory blocks, and doing all of those sorts of things.
fsIn.open("output.txt", std::fstream::in | std::fstream::binary);
fsIn.seekg(0, std::ios::end);
int nSize = (int)fsIn.tellg();
char *memblock = new char[nSize];
fsIn.seekg(0, std::ios::beg);
fsIn.read(memblock, nSize);
fsIn.close();
This has got me stumped though.
Example of dropping two text files on to the exe.
 

Animation sequence packing module
Files: 2
 
Enter sequence number: 77
0: 12
1: 11304
File Size: 22596
 
Read back
Objects in file: 2
12
Offset: 12
11304
Offset: 11304
(Offset: 12)    ID: 77  Size: 11280
(Offset: 11304) ID: 77  Size: 11280


This is an expected result and working fine.

If I drop two binary files with (seemingly) random content the program doesn't function correctly. As per below;
 

Animation sequence packing module
Files: 2

Enter sequence number: 77
0: 12
1: 11304
File Size: 22644

Read back
Objects in file: 2
12
Offset: 12
11304
Offset: 11304
(Offset: 12) ID: 77 Size: 11280
(Offset: 11304) ID: 1092444195 Size: 1097630673


In this particular case the file sizes are identical to the txt files used in the previous example to rule out silly things like buffer overruns, etc.

But, the data is getting mangled at the second entry. ID: should be 77 as well and Size: should be 11280 as per the entry above.

So, this has me confused as to why this would happen, the files are being written and read in binary mode.

Probably somewhat confusing as to what I have posted here, so I am happy to share whatever you need, just point me in the right direction as to what info you are after.

I can drop any amount of text files I can find (even hundreds) and the program behaves as expected. But as soon as the data in the files becomes extremely complex the program falls over instantly (even throws 'not responding' prior to crashing)

Any help would be greatly appreciated :) Edited by DarkRonin

Share this post


Link to post
Share on other sites
This is working fine for any amount of text documents etc... But when I use more complex files. The program crashes

Whats the crash? At what line in your code does it crash? This information is vital for anyone understanding your problem, and maybe it will even help you resolve the issue by yourself.

EDIT: A wild quess: You are not interpreting the file content properly. For example, if you are searching for a terminal character at the end of the files content, for a textfile this character will not be in there, but a binary file obvisouly can contain any character possible, so it might only take a fraction of the files content and try to read additional meta-data out of the rest of the file. But again, without knowing whats exactly wrong, thats just total quesswork.

Edited by Juliean

Share this post


Link to post
Share on other sites

The code you posted is almost certainly not where the problem is. The problem is probably in whatever you're doing to pack assets.

If you have a file that you know crashes the program, then run the program in the debugger on that file. Find out exactly where the crash is.

Share this post


Link to post
Share on other sites
Thanks guys,

The format is pretty simple.

Example of how three files are packed

Total Items
OffsetX
OffsetY
OffsetZ
[OffsetX]
Type
ID
Size
Payload
[OffsetY]
Type
ID
Size
Payload
[OffsetZ]
Type
ID
Size
Payload

(everything here is unsigned ints (four bytes a piece), with the exception of the 'payloads'. They can be any size.

When reading back a file that has been packed with any amount of text files, the data gets interpreted correctly every single time, regardless of the size of the 'payload'.

This is what I don't understand. I can't get my head around why these interpret at 100% success, but when the text files are replaced by seemingly 'gibberish' (but purposeful) data files, the problem occurs.

I even tried to make the 'payload' unsigned chars just in case, but fstream seems to require chars.

I'll try and analyse the raw data files some more and see if I can detect where it throws things off. Not quite sure how I'll achieve that though when the data files average 12KB each.

Share this post


Link to post
Share on other sites
The format is pretty simple.

We weren't talking about the format of the file, but your code. You do interpret that file in via code, right?

 

 

I'll try and analyse the raw data files some more and see if I can detect where it throws things off. Not quite sure how I'll achieve that though when the data files average 12KB each.

Step through the code you use for reading in the file, and see at which exact point it breaks down? If we knew what type of crash you had, we could give more precise directions. For example, if you have a heap-corruption at seamingly random points, use a program like ApplicationVerifier, which can pinpoint you down to the point of the issue.  But since we only know "it crashes", without even seeing the specific code, thats kind of hard.

Edited by Juliean

Share this post


Link to post
Share on other sites
Yes, I create the file in code and then I re-open it to re-interpret it.

The fault lies because the ID and Data size are being misinterpreted (on the second offset), thus causing the data 'blob' to be over-read due to incorrect size (interpreted), causing an exception.

I just can't figure out at this stage why this data is misinterpreted in the first place (due to the input file contents).

Hard to explain. I know why it is failing, but don't know why it is only caused by input files that contain complex data (as oppose to basic text).

Really don't know what to do, or how to go about isolating beyond this point.

Share this post


Link to post
Share on other sites

It crashes because your code for handling the data is wrong. If you won't post your code, then the help we can give you is pretty much done. Find the smallest possible reproduction case and step through it until you see where the data is wrong. Then find out if it because wrong during reading or writing. Then fix whatever did that.

Share this post


Link to post
Share on other sites

More than happy to post the entire code. But I though that might be frowned upon (due to size).

I have also attached two files. One works perfectly when dropped on the exe and the other doesn't. Both are identical size.
 

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>

class Data
{
public:
	Data()
	{
	}
	~Data()
	{
		// Not here - will crash
		//		if (memblock)
		//			delete[] memblock;
		// Yes - this leaks presently. will sort out when working
	}

	void DataCapture(char* data)
	{
		if (size)
		{
			memblock = new char[size];
			memcpy(memblock, data, size);
		}
	}

	void DataReadBack()
	{
		if (size)
		{
			for (unsigned int i = 0; i < size; i++)
			{
				std::cout << memblock[i];
			}
		}
	}

	unsigned int id = 0;
	unsigned int size = 0;
	unsigned int offset = 0;
	unsigned int type = 0;
	char *memblock = 0;
};

int main(int argc, char *argv[])
{
	unsigned int nFileCount = argc - 1;
	unsigned int nType = 0;

	std::ifstream fsIn;
	std::ofstream fsOut;


	// Welcome screen
	std::cout << "Animation sequence packing module\r\n";
	std::cout << "Files: " << nFileCount << "\r\n\r\n";

	if (nFileCount == 0)
	{
		std::cout << "No input files specified\r\n";
		system("PAUSE");
		return 1;
	}

	std::cout << "Enter sequence number: ";
	std::cin >> nType;


	// Store individual file detail into a vector
	std::vector<Data> dataVector;
	for (unsigned int i = 0; i < nFileCount; i++)
	{
		fsIn.open(argv[i+1], std::fstream::in | std::fstream::binary);

		fsIn.seekg(0, std::ios::end);
		unsigned int nSize = (int)fsIn.tellg();
		char *memblock = new char[nSize];
		fsIn.seekg(0, std::ios::beg);
		fsIn.read(memblock, nSize);
		fsIn.close();

		Data dataTemp;
		dataTemp.type = nType;
		dataTemp.id = i;
		dataTemp.size = nSize;
		dataTemp.DataCapture( memblock );
		// dataTemp.DataReadBack();
		dataVector.push_back(dataTemp);
	}


	// Calculate offsets
	int nObjects = nFileCount;
	int nPreamble = sizeof(nObjects) + (nObjects * sizeof(nObjects));

	int count = 0;
	int prevOffset = 0;
	int prevSize = 0;
	for (std::vector<Data>::iterator it = dataVector.begin(); it != dataVector.end(); ++it)
	{
		int offset = 0;
		if (count == 0)
			offset = nPreamble;		
		else
			offset = prevOffset + sizeof(unsigned int) + sizeof(unsigned int) + sizeof(unsigned int) + prevSize;

		it->offset = offset;
		prevOffset = it->offset;
		prevSize = it->size;

		std::cout << count << ": " << offset << "\r\n";//" (Raw data starts at " << offset + 16 << ")\r\n";// name16, id4, size4, 
		count++;
	}


	
	// Write file
	fsOut.open("output.txt");
	
	// Write number of objects (4 bytes)
	fsOut.write(reinterpret_cast<const char *>(&nObjects), sizeof(nObjects));

	// Write offsets (4 bytes each)
	for (std::vector<Data>::iterator it = dataVector.begin(); it != dataVector.end(); ++it)
	{
		unsigned int offset = it->offset;
		fsOut.write(reinterpret_cast<char *>(&offset), sizeof(unsigned int));
	}

	for (std::vector<Data>::iterator it = dataVector.begin(); it != dataVector.end(); ++it)
	{
		// Write type (4 bytes)
		unsigned int nType = it->type;
		fsOut.write(reinterpret_cast<char *>(&nType), sizeof(unsigned int));

		// Write ID (4 bytes)
		unsigned int nId = it->id;
		fsOut.write(reinterpret_cast<char *>(&nId), sizeof(unsigned int));

		// Write size (4 bytes)
		unsigned int nSize = it->size;
		fsOut.write(reinterpret_cast<char *>(&nSize), sizeof(unsigned int));

		// Data blob (data size varies - nSize)
		fsOut.write(it->memblock, nSize);

		std::cout << it->type << "\r\n";
		std::cout << it->id << "\r\n";
		std::cout << it->size << "\r\n";

	}

	fsOut.close();


	// read back
	fsIn.open("output.txt", std::fstream::in | std::fstream::binary);
	fsIn.seekg(0, std::ios::end);
	int nSize = (int)fsIn.tellg();
	char *memblock = new char[nSize];
	fsIn.seekg(0, std::ios::beg);
	fsIn.read(memblock, nSize);
	fsIn.close();

	std::cout << "File Size: " << nSize << "\r\n";

	int newObjects = 0;
	memcpy(&newObjects, memblock, sizeof(int));
	std::cout << "\r\nRead back\r\n";
	std::cout << "Objects in file: " << newObjects << "\r\n";
	
	std::vector<Data> dataVectorIn;
	unsigned int items = newObjects;
	unsigned int offsets = 0;

	//int
	nPreamble = sizeof(int) + (nObjects * sizeof(int));

	// Pre-fill offsets (to be used in next phase)
	for (int i = 0; i < newObjects;i++)
	{
		memcpy(&offsets, memblock + sizeof(unsigned int) + (i * sizeof(unsigned int)), sizeof(unsigned int));

		Data dataTemp;
		dataTemp.type = 0;
		dataTemp.id = 0;
		dataTemp.size = 0;
		dataTemp.offset = (unsigned int)offsets;
		//dataTemp.DataCapture(memblock);
		// dataTemp.DataReadBack();
		dataVectorIn.push_back(dataTemp);

		std::cout << "Offset: " << dataTemp.offset << "\r\n";
	}


//OK TO HERE

	// Read Type (int), ID (int), Size (int), Data (char)
	for (std::vector<Data>::iterator it = dataVectorIn.begin(); it != dataVectorIn.end(); ++it)
	{
		int offset = it->offset;

		unsigned int tempType = 0;
		memcpy(&tempType, memblock + offset + (sizeof(unsigned int) * 0), sizeof(unsigned int));
		it->type = tempType;

		unsigned int tempId = 0;
		memcpy(&tempId, memblock + offset + (sizeof(unsigned int) * 1), sizeof(unsigned int));
		it->id = tempId;

		unsigned int tempSize = 0;
		memcpy(&tempSize, memblock + offset + (sizeof(unsigned int) * 2), sizeof(unsigned int));
		it->size = tempSize;

		// char *blob = new char[it->size];
		// memcpy(blob, memblock + offset + (sizeof(unsigned int) * 3), it->size);
		// it->memblock = blob;

		std::cout << "Offset: " << it->offset << "\t";
		std::cout << "Type: " << it->type << "\t";
		std::cout << "ID: " << it->id << "\t";
		std::cout << "Size: " << it->size << "\t";
	
		/*
		for (int j = 0; j < it->size; j++)
		{
			std::cout << it->memblock[j];
		}
		*/

		std::cout << "\r\n";

		// delete[] blob;
	}



	std::cout << sizeof(int);


	delete[] memblock;


	system("PAUSE");
	return 0;
}

Obviously I'll be cleaning up any leaks and wrapping it into a class when the basic functionality is stable :)

Edited by DarkRonin

Share this post


Link to post
Share on other sites

There's a lot of stuff that you could improve in your code, the issue you're running into now is that you aren't writing the output as a binary file:

fsOut.open("output.txt");

Even though you're using the "write" member function, stream behaves differently since it's not binary. I tested it out and it completed successfully.

Share this post


Link to post
Share on other sites

Oh man something so simple (I thought it was going to be). It is indeed working now.

And yes, I do indeed have much to clean up and refactor. It was the core functionality that was driving me crazy.

Just tested with various amounts of files and it is working as intended now.

 

Thank you so much for your help. It is truly appreciated :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this