Archived

This topic is now archived and is closed to further replies.

zackriggle

File archiving - [no compression]

Recommended Posts

Is there an easy way to archive several text files? I know I would need a header for the file, specifying how many files, where each file starts and ends, etc. What else might I need to do this all? ================== My (soon-to-be) Cherished Cookie of Appreciation: -- MattB - for WinSock advice --

Share this post


Link to post
Share on other sites
Yep. You should do fine.

To be more specific, yes, you need a header with file names, info, and sizes/locations.

You can decide whether to have a fixed-size header, that can be read into a homebrewn struct with one swell foop (fell swoop) but limits your storage somewhat per archive, or have a variable-size on that will make file size smaller (though negligibly so) and storage more flexible, but takes quite a bit longer to implement.

You can decide whether to locate files by byte position, or by converting from size, or by special characters.

You can decide whether to go with no compression like you originally planned, or to use the incredibly-easy-to-use Zlib compression library (includes functions that are just like fread() and fwrite(), but that compress the data).


Have fun!

Twilight Dragon

Share this post


Link to post
Share on other sites
Okay, that kinda spooked me. I was going to ask that EXACT question! Christmas is great isn't it? [Written 12:30 AM, 12/25/02]. My New-Years Resolution -- actually finish this project!!!

I'll have a 2Kb header -- that leaves me with a max of 52-60 files per archive -- enough for what I need it for. Also, I will have an archive limit of 10mb. That gives me about a 174-200k file size average. Note that these will all be text files, and will probably never reach more than 10k at the absolute most (~1700 english words, given a 5 char average, including spaces). That's 25,000 words supposing they use all 200k, with an 7-letter-long average word.

The max size can be lifted to 16mb if need be, without changing the headers at all (279 - 300k average).
==================
My (soon-to-be) Cherished Cookie of Appreciation:

-- MattB - for WinSock advice --

[edited by - zackriggle on December 25, 2002 1:14:37 AM]

Share this post


Link to post
Share on other sites
Here''s an alternative to having a fixed size header:

1. Start the file out with 4 bytes of data which is an unsigned long, but just put in all zeros for now.

2. Then write one text file after another to the file, keeping track of where the text data starts in the pack file and how long it is.

3. Once all the files are written, use ftell() or what have you to find the position in the file. Store that position for later.

4. Then write the table of contents at the end including the number of files archived, each file''s original name, the offset into the data file of where the file starts, and the size.

5. Once written, fseek() back to the START of the file, and replace those 4 bytes ( which are zeros ) with the position of where the table of contents began (in step 3).

That way you can have any size of header, except its at the very end of the file. This also makes adding more files to a pack file without rewriting the whole thing possible.

When reading, read the first 4 bytes in as an unsigned long, then fseek to that position in the file and read from the TOC the number of files in the TOC, and then read each file''s TOC entry. Once read, iterate through the read TOC and read each file. Make sense?

I like this system since it removes the need for having to predefine a header size, plus it lets you add more files to a pack file without rebuilding the whole thing simply by parsing the TOC, writing over it, then writing a new TOC including the new file.



Share this post


Link to post
Share on other sites
Good idea TDragon... I should have though of that... lol...
A but too late, though -- I've already spent about two hours writing the read/write functions for each of these. About all I have to do now is find any memory leaks (if I wrote the code, there WILL be some....), and fix any pointer problems.
--If you want the source (51k - most from self-extractor, only 3 source files...), get it at:
http://members.fortunecity.com/mqonline/


Here's how I did it:

--==--==--==--==--==--
FILE_TYPE -> stored as HEX
VERSION -> stored as HEX
NUM_OF_FILES -> stored as HEX
HEADER_END -> stored as 2 HEX
ARCHIVE_NAME

FILE_1_NAME
FILE_1_START -> as HEX
FILE_1_END -> as HEX

FILE_2_NAME
FILE_2_START -> as HEX
FILE_2_END -> as HEX

###### Put pounds until we get to byte 2046 ######
###### The put a newline (byte 2047) #############
###### Actual Data starts on byte 2048 ##########
FILE_1
FILE_2
--==--==--==--==--==--

By the way, this is almost directly from the I'm-so-dumb-I-need-to-write-a-readme-for-myself-file.

==================
My (soon-to-be) Cherished Cookie of Appreciation:

-- MattB - for WinSock advice --

[edited by - zackriggle on December 25, 2002 3:14:04 AM]

Share this post


Link to post
Share on other sites
I''ve got a dead simple and actually very powerful, backwards-and-forwards compatible format that I designed to be a personal replacement for TAR and such. I call them TIR archives. No compression yet, but that could be added without too much hassle.

By the way, I _want_ it to be open-source. As long as it stays controlled-format (meaning I control what features get added to the format standard and such, not to be a controlling bastard, but so that I can maintain quality and compatibility in the file-format).

So, if anyone wants the (well-documented) C source, a working Win32 EXE, and the documentation, I am more than happy to start distributing it!

Share this post


Link to post
Share on other sites