Package File Format

Started by
9 comments, last by Norman Barrows 10 years ago

Hi all,

Package is a need because you can have a faster load time because you open a big file and not a lot of files.

When you works on content, you works without package and then you pack the folder at the end.

When you pack, you split each 4go, I think this number is not bad to have nice package size.

Now the question is how makes a good package file format and what compression use.

Is ZLIB the best choice to compress data for a package file format ?

Is a good package file format only :

- Header
- File table
- File data

Since the content on disk is a folder based, the file table should contains the full path using this folder has root.

Is it better to store like that or have a hierarchical file table ?

Thanks

Advertisement
Most I've seen, the header is the file table ;)
This part is read into Ram on startup and kept there. It lets you perform a lookup by filename and retrieve the offset and size of a file within the archive.

The last engine that I used, used paths within filenames (e.g. An asset might be called "foo/bar/baz.type.platform") - so just a flat/non-hierarchical table containing these long names.

On my current engine, I actually ignore paths completely, basically moving all assets into a single directory when building an archive (e.g. The above would just be "baz.type.platform"). This means that you can't have two assets with the same name, but I see this as a positive feature rather than a negative ;-)
During development, assets are stored as individual files (not packaged) so it's easy to make changes to the game. Also, any folder structure can be used during development (during dev, that file might be stored in the "foo/bar" directory, the game doesn't care).

We used to mount a file as a file system, it gave you massive speed increases on windows.

We used to call it a Virtual File System, here is an example.

http://www.flipcode.com/archives/Programming_a_Virtual_File_System-Part_I.shtml

Maybe that can do the sort of thing you want to do.

This means that you can't have two assets with the same name, but I see this as a positive feature rather than a negative ;-)

How can you differentiate if it has the same name ?

In UnrealEngine you can't have two asset using the same name, maybe they flat all using name only.

EDIT : I just tried and you can have the same name if not in same folder, so surely a flat hierarchy name.

The last engine that I used, used paths within filenames (e.g. An asset might be called "foo/bar/baz.type.platform") - so just a flat/non-hierarchical table containing these long names.

Looks not bad, that avoid recursion to search file data.

During development, assets are stored as individual files (not packaged) so it's easy to make changes to the game.

This is better yea, Unity works like that, Unreal Engine changed on his version 4 using this method as well.

A Lot of user of unreal engine was not happy to always works using package all the time.

We used to mount a file as a file system, it gave you massive speed increases on windows.
We used to call it a Virtual File System, here is an example.

Is it really a need ?

-----

What's about the compression ? That still a question.

What's about the compression ? That still a question.

You can build that in if you want to. Zlib is pretty fast, but it's not as fast as just loading a file so you will have to make a judgement call to see if it is worthwhile.

I use zipped file systems all the time for mobile devices, but it's not always a good idea.


This part is read into Ram on startup and kept there.

No split ?

This part is read into Ram on startup and kept there.

No split ?
the header/table should only be a few KB. Easy to 'waste' RAM on storing that whole table, to make loading files easier.

How can you differentiate if it has the same name ?

I don't. If the artists have "level1/concrete.png" and "level2/concrete.png", then the engine tools gives an error, asking them to delete or rename one of them.

What's about the compression ? That still a question.

Using zlib or LZMA SDK is pretty common. Use it to compress each individual file, then in the header/table you can store the offset, compressed size and uncompressed size of each file. When loading a file, malloc the uncompressed size, then stream the compressed data off disk and through your decompression library into the malloc'ed buffer.
Unless the user has an SSD or RAM-disk, this should actually be a lot faster than loading uncompressed files! (As long as you've got the spare CPU time to do the decompression)

We call our packages "archives". The format is basically a table of contents containing a map of symbols (hashed string asset names) to a struct containing the offset + size of the actual asset data. All of our asset ID's are flat like in Hodgman's setup. The whole archive is compressed using Oodle (compression middleware by RAD Game Tools), and when we load an archive we stream in chunk by chunk asynchronously and pipeline the decompression in parallel. Once that's done we have to do a quick initialization step, where we mostly just fixup pointers in the data structures (on Windows we also create D3D resources in this step, because you have to do this at runtime). Once this is done the users of the assets can load assets individually by asset ID, which basically just amounts to a binary search through the map and then returning a pointer once the asset is found.

As for loose files vs. packages, we support both for development builds. Building a level always triggers packaging an archive, but when we load an archive we check the current status of the individual assets and load them off disk if we determine that the version on disk is newer. That way you get fast loads by default, but you can still iterate on individual assets if you want to do that.

Once this is done the users of the assets can load assets individually by asset ID, which basically just amounts to a binary search through the map and then returning a pointer once the asset is found.

Do you keep loaded the file or use fopen when you want to keep a file data ?

Is it bad to fopen/fclose for each file in a GetMemoryFile( const int EntryIndex ) ?

Same as MJP here but we don't store flat list hashes, we either use a hash for full path or hash per directory/file depending on complexity and use a minimal perfect hash for lookup.

This topic is closed to new replies.

Advertisement