Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

bilsa

updating files in a PAK file? - resizing :)

This topic is 5360 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey guys! I was wondering if there is some way to insert data in a filestream with the standard io library? I know I could just go to the offset of the file that I have just uppdated, insert the new file and write the rest of the file back. But I would want to avoid to rewrite the rest of the file in case it would happen to be a couple of hundreds of megs. No, that is not acceptable... So is there any way with the standard io, if not then maybe with windows specific code? (though the windows specific code as a last result...) If not, then here is an idea I have: I was also kinda thinking that I could keep a list of pointers to a 4kb chunk of data. Then I could break this list and insert new chunks of 4kb data. Though this way there would be from 0-4kb mem storage lost for each file that I write to the pack file. The pack file would then contain a list in the header with the files that it contains, along with the start chunk/ end chunk for the file. That way I can easily check which list iterator to break, which chunk of data to push back and let the new data be inserted inbetween. If the loss of memory storage would be a great problem I could eventually add a compressionalgorithm that removes the "0"'s of the last chunk in every "file" in the pack file, just after the new data was inserted... For appending files there would of course be a normal algorithm... Questions: Do you think my sollution would be slower than rewriting the rest of the pack file after the inserted data? And what about adding a compression algoritm, that checks the header list containing the files respectively the start/end chunk and removes the "loss of data" in each end chunk. Would it be worth it or would that slow things down too much? What size do you guys think would be the for each chunk to be? I thought that 4kb isn't that much of data lost per file. I mean, even if I have 10000 resource files, that would "only" be 40mb data loss. (hopefulle 10000 files would be a gig or two anyway...) Also I though of the paging system in the HDD (or if its windows.. whatever...), I think its on 4kb, so that would speed it up a bit? Would appreciate any response [edited by - bilsa on April 15, 2004 9:46:36 AM] [edited by - bilsa on April 15, 2004 9:48:26 AM]

Share this post


Link to post
Share on other sites
Advertisement
For my current PAK archive, I just rewrite the entire PAK archive when you add new data. I have been going over this idea several times and thought it over lots and lots more.

Ideas are to make it multithreaded, but that doesn't take away the long waiting time. There other idea I had was to write additional 4/8/16K bytes. These bytes would contain nothing but empty data. This would bring a few good things with it:

1) You can add in new data without rewriting the entire PAK archive. You just add in a new header and the new data
2) Speed up

You would still need to recalculate the CRC32 value of the headers. No big deal.

This still doesn't fix the problem of removing data from the archive. Removing data from the archive is another story. You would need to shift ALL the other data to the left. If you delete the first file, it would be faster to rewrite the file. However, if you delete the middle file, you can keep the current archive, move all data to the left.

There are no standard Win32 functions that allow you to say, move file block x to position A. You need to write those yourself.

Oh, and if you are interested, my PAK archive can be found here

Toolmaker



My site
/* -Earth is 98% full. Please delete anybody you can.*/

[edited by - toolmaker on April 15, 2004 10:54:39 AM]

Share this post


Link to post
Share on other sites
Maybe I''m missing the point of what you''re trying to do but I don''t see the point in worrying about something like that.

I, myself, have some compiler directives that tell the code whether or not to use the PAK archive. During my development I just tell it not to worry about using the PAK and it just uses the files as they are found on the file system.

When I come to build a distribution I have a build script that first creates the PAK file (from scratch) using all the directories I normally use and also builds the game telling it to use the PAK file.

So I need not worry about the PAK file at all until I am ready to build a distribution... and since this is not very often working out how to insert new data isn''t an issue since speed it''s a concern.

That''s probably not what you meant at all though, is it?

Share this post


Link to post
Share on other sites
Is the problem wasted space or performance? If performance is the required, I think new data would just be added to the end so that it is sequential in the file. You would have some type of directory in the beginning that would bet updated when a new file was added. The old data would just be lost.

I wouldn''t think wasted space would be a problem considering the size of hard drives now. But if it is, you would have to create some type of file system within the file. Create a directory in the header and have a pointer to a block. You could use anysize block but 4K sounds about right. The pointer would be a block number that would be the first 4K of the file. If the file is greater than the 4K, the pointer to the next block could be stored in the beginning of the block. So actually, the block would hold 4K - sizeof(block pointer). The actual size of the file would be stored in the directory header. You would read blocks, skipping to the next until you got all the data.

To delete a file, all the block would be added to a deleted block list. In the header, you would have a block pointer to the first deleted block. If zero, new blocks would be added to the end of the file. If not zero, it would point to a free block. The block would have a pointer to the next deleted block that would be copied to the directory header when the block was used.

I guess you get the idea. Not very efficient, but not much wasted space. Let me know if you have any questions, I might be able to come up with some pseudo code. I used this technique 20+ years ago when diskspace was a problem.


- onebeer

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!