Sign in to follow this  
sjaakiejj

reading binary file header

Recommended Posts

Hi,

I'm trying to read a binary file header which consists of a string and an integer. Whilst text-reading is not a problem, binary file streams prove to be much more challenging, getting me stuck at this place - my current code reads the first value, the string, perfectly. The second value is a hexadecimal, and when I try to read it, my code displays a '.' and a long empty string.

Here is my code

//First open the file
std::ifstream file(filename, std::ios::in | std::ios::binary);

//Then check if it opened
if(!file)
{
//Error opening the file - no point running this. Check your file!
fprintf(stderr, "An error occured whilst opening the file '%s'.\n"
"Please ensure that the file exists, and that you have\n"
"read access to it.\n", filename);
exit(-1);
}

char magic[4];

//Read and store the magic number
file.read(magic, 4);
printf("Header: %s\n", magic); // This works

char version[4];

file.read(version,4);
printf("Version: %i\n", atoi(version)); //This returns 'Version: 0' (e.g. atoi fails)
printf("Version: %s\n", version); //This returns 'Version: .'




Whilst the first read works, the second doesn't. I've been staring at this code for a while now, and I looked at examples on the internet, but I can't figure out how my code is different from the code that does work. Perhaps it's just me being tired.

Thanks in advance.

Share this post


Link to post
Share on other sites
What number do you expect to get for the version? "." is 46 in the ascii map, which is probably why the char to int conversion fails (because it doesn't represent an int). Can you tell what the 4 char's are for version? Just look at them each in memory and see what their value is, don't rely on the string conversion.

Also it isn't clear what you mean by "the second value is a hexadecimal". Is it supposed to be a hexadecimal string like "8F", or do you mean that the byte is meant to be interpreted as hexidecimal like 46 = 0x2E.

Share this post


Link to post
Share on other sites
If the version is an int...

int version;
file.read( static_cast<char*>(&version), sizeof( int ) );

Depending on the file format (and if you have a non-x86 process), you may need to endian-swap the int after reading it.

read() takes a char* because a char is very likely 1 byte. It doesn't mean you have to use a char buffer as the input to it. In fact, you could just

#pragma pack(push,1)
struct Header
{
char magic[4];
int version;
//...
};
#pragma pack(pop,1)

//...
Header header;
file.read( static_cast<char*>(&header), sizeof(Header) );



The pragmas insure everything is <padded/packed exactly as you have it. without them, the compiler may put in extra padding to align variables for faster access.


edit:
Quote:

printf("Header: %s\n", magic); // This works

I'm impressed that works.... magic may not be a null terminated string. Your printf, as written is very likely to print something like "TGAB!@#$!@#$%%$%#%$#$%$".
Also... why aren't you using std::cout?

Share this post


Link to post
Share on other sites
Don't use a character array (string) for the signature ("magic"). Use an "unsigned int". Store it in your code as an unsigned int and then when you read it from the file you read an unsigned int. Then you compare the two unsigned ints.

Of course the same applies to the version as well.

Share this post


Link to post
Share on other sites
Quote:
Original post by KulSeran
Quote:

printf("Header: %s\n", magic); // This works

I'm impressed that works.... magic may not be a null terminated string. Your printf, as written is very likely to print something like "TGAB!@#$!@#$%%$%#%$#$%$".
Also... why aren't you using std::cout?


Linux commandline catches those characters, so I can see a square behind it instead of the strange characters. Using printf is more of a convention that I took over from programming in Java, though I don't claim it's better or worse.

Thanks everyone for the help, I'll use your suggestions to fix my stupidity :)

Share this post


Link to post
Share on other sites
The file contains unformatted data. The reason you don't see a number when you read the version number as bytes is because it isn't written with the bytes that are used to represent a number in text; it's written with the bytes that are actually used, in memory, to represent the number. Similarly, atoi() fails because its purpose is to convert from the text representation to the in-memory representation. Garbage in, garbage out.

We fix this by creating an int-sized variable and reading the appropriate number of bytes from the file, into the region of memory that the variable represents. This is what KulSeran's example does.

@KulSeran: A char is one byte, by definition. But a byte is not necessarily an octet. :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this