Jump to content

  • Log In with Google      Sign In   
  • Create Account

Package File Format


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
11 replies to this topic

#1 Alundra   Members   -  Reputation: 867

Like
0Likes
Like

Posted 20 April 2014 - 07:47 PM

Hi all,

Package is a need because you can have a faster load time because you open a big file and not a lot of files.

When you works on content, you works without package and then you pack the folder at the end.

When you pack, you split each 4go, I think this number is not bad to have nice package size.

Now the question is how makes a good package file format and what compression use.

Is ZLIB the best choice to compress data for a package file format ?

Is a good package file format only :

- Header
- File table
- File data

Since the content on disk is a folder based, the file table should contains the full path using this folder has root.

Is it better to store like that or have a hierarchical file table ?

Thanks


Edited by Alundra, 20 April 2014 - 07:48 PM.


Sponsor:

#2 Hodgman   Moderators   -  Reputation: 30385

Like
7Likes
Like

Posted 21 April 2014 - 12:25 AM

Most I've seen, the header is the file table ;)
This part is read into Ram on startup and kept there. It lets you perform a lookup by filename and retrieve the offset and size of a file within the archive.

The last engine that I used, used paths within filenames (e.g. An asset might be called "foo/bar/baz.type.platform") - so just a flat/non-hierarchical table containing these long names.

On my current engine, I actually ignore paths completely, basically moving all assets into a single directory when building an archive (e.g. The above would just be "baz.type.platform"). This means that you can't have two assets with the same name, but I see this as a positive feature rather than a negative ;-)
During development, assets are stored as individual files (not packaged) so it's easy to make changes to the game. Also, any folder structure can be used during development (during dev, that file might be stored in the "foo/bar" directory, the game doesn't care).

#3 Stainless   Members   -  Reputation: 941

Like
1Likes
Like

Posted 21 April 2014 - 02:29 AM

We used to mount a file as a file system, it gave you massive speed increases on windows.

 

We used to call it a Virtual File System, here is an example.

 

http://www.flipcode.com/archives/Programming_a_Virtual_File_System-Part_I.shtml

 

Maybe that can do the sort of thing you want to do.



#4 Alundra   Members   -  Reputation: 867

Like
0Likes
Like

Posted 21 April 2014 - 05:34 AM

This means that you can't have two assets with the same name, but I see this as a positive feature rather than a negative ;-)

How can you differentiate if it has the same name ?

In UnrealEngine you can't have two asset using the same name, maybe they flat all using name only.

EDIT : I just tried and you can have the same name if not in same folder, so surely a flat hierarchy name.

 

The last engine that I used, used paths within filenames (e.g. An asset might be called "foo/bar/baz.type.platform") - so just a flat/non-hierarchical table containing these long names.

Looks not bad, that avoid recursion to search file data.

 

During development, assets are stored as individual files (not packaged) so it's easy to make changes to the game.

This is better yea, Unity works like that, Unreal Engine changed on his version 4 using this method as well.

A Lot of user of unreal engine was not happy to always works using package all the time.

 

We used to mount a file as a file system, it gave you massive speed increases on windows.
We used to call it a Virtual File System, here is an example.

Is it really a need ?

 

-----

 

What's about the compression ? That still a question.


Edited by Alundra, 21 April 2014 - 05:40 AM.


#5 Stainless   Members   -  Reputation: 941

Like
0Likes
Like

Posted 21 April 2014 - 09:17 AM

 

What's about the compression ? That still a question.

 

 

You can build that in if you want to. Zlib is pretty fast, but it's not as fast as just loading a file so you will have to make a judgement call to see if it is worthwhile.

 

I use zipped file systems all the time for mobile devices, but it's not always a good idea.



#6 Alundra   Members   -  Reputation: 867

Like
0Likes
Like

Posted 21 April 2014 - 01:37 PM


This part is read into Ram on startup and kept there.

No split ?



#7 Hodgman   Moderators   -  Reputation: 30385

Like
3Likes
Like

Posted 21 April 2014 - 10:40 PM

This part is read into Ram on startup and kept there.

No split ?
the header/table should only be a few KB. Easy to 'waste' RAM on storing that whole table, to make loading files easier.

How can you differentiate if it has the same name ?

I don't. If the artists have "level1/concrete.png" and "level2/concrete.png", then the engine tools gives an error, asking them to delete or rename one of them.

What's about the compression ? That still a question.

Using zlib or LZMA SDK is pretty common. Use it to compress each individual file, then in the header/table you can store the offset, compressed size and uncompressed size of each file. When loading a file, malloc the uncompressed size, then stream the compressed data off disk and through your decompression library into the malloc'ed buffer.
Unless the user has an SSD or RAM-disk, this should actually be a lot faster than loading uncompressed files! (As long as you've got the spare CPU time to do the decompression)

#8 MJP   Moderators   -  Reputation: 11363

Like
4Likes
Like

Posted 21 April 2014 - 11:04 PM

We call our packages "archives". The format is basically a table of contents containing a map of symbols (hashed string asset names) to a struct containing the offset + size of the actual asset data. All of our asset ID's are flat like in Hodgman's setup. The whole archive is compressed using Oodle (compression middleware by RAD Game Tools), and when we load an archive we stream in chunk by chunk asynchronously and pipeline the decompression in parallel. Once that's done we have to do a quick initialization step, where we mostly just fixup pointers in the data structures (on Windows we also create D3D resources in this step, because you have to do this at runtime).  Once this is done the users of the assets can load assets individually by asset ID, which basically just amounts to a binary search through the map and then returning a pointer once the asset is found.

 

As for loose files vs. packages, we support both for development builds. Building a level always triggers packaging an archive, but when we load an archive we check the current status of the individual assets and load them off disk if we determine that the version on disk is newer. That way you get fast loads by default, but you can still iterate on individual assets if you want to do that.


Edited by MJP, 22 April 2014 - 08:12 PM.


#9 Alundra   Members   -  Reputation: 867

Like
0Likes
Like

Posted 22 April 2014 - 11:17 AM

Once this is done the users of the assets can load assets individually by asset ID, which basically just amounts to a binary search through the map and then returning a pointer once the asset is found.

Do you keep loaded the file or use fopen when you want to keep a file data ?

Is it bad to fopen/fclose for each file in a GetMemoryFile( const int EntryIndex ) ?


Edited by Alundra, 22 April 2014 - 11:18 AM.


#10 Nikster   Members   -  Reputation: 178

Like
0Likes
Like

Posted 22 April 2014 - 06:32 PM

Same as MJP here but we don't store flat list hashes, we either use a hash for full path or hash per directory/file depending on complexity and use a minimal perfect hash for lookup.

#11 Norman Barrows   Crossbones+   -  Reputation: 2134

Like
0Likes
Like

Posted 22 April 2014 - 09:00 PM

back in the day, they were called a WAD file or resource file. used to do them myself for GAMMA Wing (circa 1995).  i did an in-house implementation that supported all the basic file formats used by the company's titles, and included things like on-the fly decompression from the resource file into ram. one unusual feature was separate resource and index files. the index file was opened, it was used to read all resources at program start from the resource file, then both files were closed.

 

nowadays, i keep everything out in the open for easy modding by fans.


Norm Barrows

Rockland Software Productions

"Building PC games since 1988"

 

rocklandsoftware.net

 


#12 samoth   Crossbones+   -  Reputation: 4783

Like
1Likes
Like

Posted 23 April 2014 - 05:23 AM


Using zlib or LZMA SDK is pretty common. Use it to compress each individual file, [...] Unless the user has an SSD or RAM-disk, this should actually be a lot faster than loading uncompressed files!

This is very true, but the fact that it is commonly done does not mean it is necessarily correct. My guess is that a lot of people simply use ZLib "because it works", and because it has been used in some well-known virtual filesystem libraries where it has proven to work, too. That was at a time when the goal of compression was slightly different, too (reduce storage space).

 

I wouldn dearly recommend LZ4/HC over ZLib.

 

You compress to gain speed, not to save space. This is rather obvious, but it should still be re-iterated, so there is no misunderstanding. Disk space is abundant and cheap, and hardly anyone will notice half a gigabyte more or less, but long load times suck.

 

You gain speed if, and only if, loading the compressed data and decompressing is faster than just loading raw data. Again, this is actually pretty obvious.

 

ZLib decompression speed is around 60 MB/s, bzip2 around 15 MB/s, and LZMA around 20 MB/s. Your timings may vary by 1-2 MB/s depending on the data and depending on your CPU, but more or less, it's about that.

 

My not-very-special 7200RPM seagate disk which I use for backups delivers 100 MB/s (+/- 2 MB/s) for non-small-non-random reads (about ½ that for many small files). Both my OCZ and Samsung SSDs deliver 280 (+/- 0) MB/s no matter what you want to read, they'd probably perform even a bit better if I had them plugged into SATA-600 rather than SATA-300 (which the disks support just fine, only the motherboard doesn't). The unknown SSD in my Windows8 tablet delivers 120-140 MB/s on sequential, and 8-10 MB/s on small files.

My elderly close-to-fail (according to SMART) Samsung 7200RPM disk in my old computer is much worse than the Seagate, but it still delivers 85-90 MB/s on large reads ("large" means requesting anything upwards of 200-300 kB, as opposed to reading 4 kB at a time).

 

Which means that none of ZLib, bzip2, or LZMA are able to gain anything on any disk that I have (and likely on any disk that any user will have). Most likely, they're indeed serious anti-optimizations which in addition to being slower overall also consume CPU.

 

Speed-optimized compressors such as LZF, FastLZ, or LZ4 (which also has a slow, high-quality compressor module) can decompress at upwards of 1.5 GB/s. Note that we're talking giga, not mega.

 

Of course a faster compressor/decompressor will generally compress somewhat worse than the optimum (though not so much, really). LZ4HC is about 10-15% worse in compressed size, compared to ZLib (on large data). However, instead of running slower than the disk, it outruns the disk by a factor of 15, which is very significant.

If decompression is 15 times faster than reading from disk, then you are gaining as soon as compression saves you 1/15 (6.6%).

 

Also, given that you copmress individual resources or small sub-blocks of the archive (again, for speed, otherwise you would have to sequentially decompress the whole archive from the beginning), your compression rates are very sub-optimal anyway, and you may find that the "best" compressors do not compress much better than the others anyway.

For example, a compressor that does some complicated near-perfect modelling on 5-10 MB of context is simply worth nothing if you compress chunks of 200-400 kB. It doesn't perform significantly better than a compressor that only looks at a 64kB context (a little maybe, but not that much!).

 

LZ4 compresses pretty poorly in comparison with better compressors (but then again, given the speed at which this happens, it's actually not that bad) but LZ4/HC is very competitive in compression size. Of course, compression is slow, but who cares. You do that offline, once, and never again. Decompression, which is what matters, is the same speed as "normal LZ4" because it is in fact "normal LZ4".


Edited by samoth, 23 April 2014 - 05:24 AM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS