binary file question

Started by
9 comments, last by Sync Views 15 years, 10 months ago
I want to store serveral blocks of data in a binary file but I'm not sure if this is a good design: first byte: stores a file version number so I can identify older files (which may have a diffrent format) Lookup Table: Each entry is a null-terminated string followed by 4 bytes representing the actual data offset from the end of the look up table 2 null chars the data for each item So I might have;

0x01 - version 1
"ent_world\0" 0x00000000
"ent_spawn\0" 0x00020510
"ent_win\0"   0x00020530
0x0000 <- end of lookup table

ent_world data
...
ent_spawn data
...
ent_win data
...
The problem I see with this idea is that I have to be careful when changing things in the middle of the file to update the lookup table arcordingly...
Advertisement
Quote:Original post by Sync Views
I want to store serveral blocks of data in a binary file but I'm not sure if this is a good design:

Looks fine to me. It's a very common approach, although often the string identifiers tend to be integer 'codes'.

Quote:The problem I see with this idea is that I have to be careful when changing things in the middle of the file to update the lookup table arcordingly...

I would imagine you don't use a text editor to tamper with your file, but you generate it in code. So that code should ensure that the lookup table isn't corrupted by regenerating it from the several chunks.

By using a human readable format it'll be easier to spot errors, but even then you or the application has to make sure everything validates. There's no way around that.
This seems like it would work out pretty well. Possibly write a script that could update your files if you add stuff to the middle of the file.
ok just found a problem... It's quit common for the "pointers" to have one or more null bytes at the start so I need a new quick way to detect the end of the lookup table...
Quote:Original post by Sync Views
ok just found a problem... It's quit common for the "pointers" to have one or more null bytes at the start so I need a new quick way to detect the end of the lookup table...

Add an additional integer to the header that indicates the number of entries/chunks. That way you don't need to detect the end of the lookup table.
I would precede each string with it's length and pad out to a 4 byte boundary. Don't rely on null terminated strings, as if your file gets corrupted, you could get a buffer overflow error when reading it in.

I would also pad data sections to 4 byte alignments and make your pointers reletive to the beginning of the file.
Quote:Original post by Sync Views
ok just found a problem... It's quit common for the "pointers" to have one or more null bytes at the start so I need a new quick way to detect the end of the lookup table...


I don't get why this is a problem - after you've read a null terminated string, you are always reading the next four bytes (or whatever) as a pointer, surely? I don't see that it would matter whether these bytes were zeros or not.
Quote:Original post by Sync Views
I want to store serveral blocks of data in a binary file but I'm not sure if this is a good design:


It might not be a good design simply because it's binary. What are you hoping to accomplish this way? Have you thought about it?

Quote:
first byte: stores a file version number so I can identify older files (which may have a diffrent format)


Usually a good idea.

Quote:Each entry is a null-terminated string


I strongly recommend you not do that. Put a length count at the beginning instead. Then you don't have to worry about embedded nulls or escape sequences or any of that nonsense. It also makes the reading-in process much easier: you don't have to allocate a buffer and keep reading in until you find the terminator, possibly re-allocating continuously. Instead, you just read the length count, allocate space and read string data.

Quote:The problem I see with this idea is that I have to be careful when changing things in the middle of the file to update the lookup table arcordingly... (and figure out exactly where the table ends)


Doctor, it hurts when I do that.

Why create that indirection, and then burden yourself with maintaining it? Why not just write the data for each item immediately after its name? You can prefix each data-set with its size as well, and everything is consistent and easy to deal with.

0x01 - version 10x00000009 "ent_world" 0x00000100 data for ent_world0x00000009 "ent_spawn" 0x00000080 data for ent_spawn0x00000007 "ent_win" 0x00000300 data for ent_win


Quote:The problem I see with this idea is that I have to be careful when changing things in the middle of the file to update the lookup table arcordingly...


Files are not random-access anyway. Don't think of your file as a chunk of data which you "edit". Instead, just load data into the objects of your program, manipulate the objects, and then save data corresponding to the new object state.
Ok so ive gone as storeing the file just as a series of chuncks

First byte -> version

4 bytes -> total size of chunk
2 bytes -> size of name
x bytes -> name
x bytes -> data
next

I was wondering though if this is an unsafe way for writing/loading files because diffrent computers might handle this diffrently (eg a map I made to go with the game might not load on some peoples computers)

file->write((char*)&val, sizeof(short));
and
file->read((char*)&val, sizeof(short));


EDIT: Also is there a fairly full proof way to easily validate a file as right now someone might ask it to load say a bitmap and as long as that first byte is a valid version Ive got no idea what would happen.
Quote:Original post by Sync Views
Ok so ive gone as storeing the file just as a series of chuncks

First byte -> version

4 bytes -> total size of chunk
2 bytes -> size of name
x bytes -> name
x bytes -> data
next

I was wondering though if this is an unsafe way for writing/loading files because diffrent computers might handle this diffrently (eg a map I made to go with the game might not load on some peoples computers)

file->write((char*)&val, sizeof(short));
and
file->read((char*)&val, sizeof(short));


If you need the maps to be cross-platform, control the endianness yourself. There are numerous ways to do this; if you're this far along already, you should be able to think of a fairly good one easily.

Quote:EDIT: Also is there a fairly full proof way to easily validate a file as right now someone might ask it to load say a bitmap and as long as that first byte is a valid version Ive got no idea what would happen.


You can add a tag to the beginning of the file indicating its type, just as the bitmap files do. Good luck being really sure it's not the same one anyone else is using, though. You're better off making sure your loading code doesn't have any buffer overruns (whenever you read a size from a file, use some kind of dynamically allocated memory such as a std::vector to hold the following sized data), and design things in such a way that, as soon as you encounter a problem, you can easily just throw away anything loaded so far and report an error.

This topic is closed to new replies.

Advertisement