Sign in to follow this  

chunked data files

This topic is 4403 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I need to save some big data files with various informations inside. Problem is that some blocks can be removed and some added in the future to file. Some one have told me best method to do this is to use chunked data file. However, i am not very clear how it works, can anyone explain whats the logic behind chunked data files? Also any tips and problems are apreciated. Thanks in advance.

Share this post


Link to post
Share on other sites
Mark the particular blocks with their respective type in front. Choose e.g. an unique 32 bit value for that purpose, one for each kind of chunk. You may also add some version information and chunk length. Then, when reading a chunk, you first read the type, look up whether the reader is able to handle it. If so, then read it in, else skip it (what could be done due to the knowledge of the chunk's length).

So a chunk may have this general structure:
CHUNK_TYPE // one of the unique numbers denoting the type
CHUNK_VERSION // could be used, e.g. as a kind of "sub-type"
CHUNK_LENGTH // how many bytes follow: just the byte-size of data
chunk_data // the data itself

Chunks normally show the type and length fields. The version field is less often used. Some implementations offer a checksum field (e.g. at the end of the chunk), but that is mostly not necessary.

Share this post


Link to post
Share on other sites
Moreover it is often useful to start the file with a table-of-contents listing all the chunks with their respective offset from the start of the file. This way you can immediately scroll to the right chunk instead of browsing through all of them until you encounter it.

Greetz,

Illco

Share this post


Link to post
Share on other sites
Oh, and if you often add chunks to (large) files, it may be a benefit to put the table-of-contents Illco mentioned at the end of the file. That way, only the last part needs to be rewritten.

Share this post


Link to post
Share on other sites
That's true and I considered adding it but you beat me to it. But the reason you mention is not the main one if any. It is just as efficient to write to the start of the file as it is writing to the end. The problem however is this:

If the table is in front you need to reserve space for it first, then write all the chunks and then return to write the table in the reserved space. This requires knowledge about what follows (you need to know the exact size of the table) while it is often convenient to just process the chunks and keep track of what you have in a (temporary, in-memory) index table. Therefore it should come at the end. You can put an offset at the top of the file indicating where the index table starts.

Illco

Share this post


Link to post
Share on other sites

This topic is 4403 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this