How do game engineers pack their data?

Started by
21 comments, last by TheChubu 9 years, 1 month ago


Can you name file formats that match this "optimized format" description?

They are usually specific to individual games, or at least game engines.

Generalising such a format to be an open standard would sort of defeat the purpose of a highly-specific format optimised to the needs of the individual game.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Advertisement

One more question. Let's say I have this one big .dat file in my game directory where data is present sequentially, and I load it.
Will I need to have tons of "#define X_ASSET_OFFSET" (and perhaps X_ASSET_SIZE) and then call ifstream::seekg() when I load every model?

Actual Implementation is pretty specific to platform/library, but you definately want to avoid manually specifying offsets. Usually on Initialize you walk the files content and generate a content map (resource id with file offset). Then when you want to load a given resource, you lookup the id and read from that offset. (Map Generation could potentially be something performed offline).


Can you name file formats that match this "optimized format" description?

They are usually specific to individual games, or at least game engines.

Generalising such a format to be an open standard would sort of defeat the purpose of a highly-specific format optimised to the needs of the individual game.

Then there preferable format should be some kind of binary format (unlike collada, that just points to the assets on the disk) that encapsulates the assets' binaries? Am I grasping this wrong?


Then there preferable format should be some kind of binary format (unlike collada, that just points to the assets on the disk) that encapsulates the assets' binaries? Am I grasping this wrong?

Yes.

Theory runs that a packed file format contains 'chunks' of raw data (vertex data, audio buffers, etc) which are laid out exactly as they will be needed in memory. Loading these chunks is a straightforward read/mmap operation, with no further processing required.

In order to know what chunks you need to read/mmap, you also need metadata (basically, an index to the packed file). These are stored in their own chunks, which you read in, process, and then use to load the remaining chunks. The metadata chunks should generally be very small compared to the data chunks, so these are not always stored in binary - I've seen systems that store metadata in JSON.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

In your theory, the assets of each level should be stored in a different file?

What if there's this AK47 model that can be displayed throughout the entire game? Should there be a "general" data file that contains these kind of assets?

I'm sorry but I want to fully understand this subject. A link to a nice article on this will suffice too if you don't feel like answering again.


In your theory, the assets of each level should be stored in a different file?

Some systems place the entire game assets in a single packed file. Some use common packs, plus packs for sets of levels.

Still others can load levels either from packed assets, or individual files. Many of blizzard's games work this way - the campaign levels are all packed into the main archive, but add-on multiplayer levels are downloaded and stored individually...

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]


In your theory, the assets of each level should be stored in a different file?

They do not need to be.

The actual structure you go for is very dependent upon the design of the game. It may make sense to have a set of common packages that are shared for all the code, and then individual packages that are only used for certain levels (so levels that dont use that package dont care about it).

You might have each package (binary data file with set of assets) store only one type [so common_audio.pkg only contains audio data, common_mesh.pkg only contains mesh data] or you might store it logically so that a package has many different types that are used together [weapons.pkg that contains mesh, audio, texture and material data for your weapons].

So I have been wondering the same thing for a while now, and I have few questions.

So lets say you compress all you game assists into a single zip file format. Doesn't it take longer to decompress the zip file into your hard drive and then read all its contents into the game? Or do you not decompress it to the hard drive? Is there a way to read the game assists without decompressing it to the hard drive? How does it really work?

[Update]

Well that was a stupid question. I totally forgot that you can load files into a MemoryStream. So you would just load the zip file contents into a MemoryStream without having to decompress it to the hard drive.


Well that was a stupid question. I totally forgot that you can load files into a MemoryStream. So you would just load the zip file contents into a MemoryStream without having to decompress it to the hard drive.
Exactly, the whole point of adding compression is that sometimes reading uncompressed data takes longer than reading it compressed into memory and uncompress it on the fly.

Although ZIP is kinda heavy weight for decompression AFAIK, you could try with other faster methods like LZ4 (that dont give such good compression rations though). As with anything, you'd need to test it out.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

On a lot of the games I've worked on, each game file was individually compressed using LZMA. An archive is then built by appending all the compressed files end-to-end into a giant mega-file, and also building a look-up-table/dictionary from filenams to offsets/sizes within the archive.
To load a file you look it up in the dictionary, then stream 'size' bytes (starting from 'offset') from the archive file into an LZMA decompressor.

Compression algorithms have a lot of settings, letting you balance time taken vs compression ratio. Assuming you're loading stuff on a loading screen, you want to balance those settings so the the decompression CPU code takes the same amount of time as the file reading IO time - this way you can queue up a lot of files and keep the IO busy at 100% while getting the best compression ratio possible.
On DVD games, this means high compression setting. On bluray, even higher. On HDD, low to none, as these are way faster - it can be faster to load uncompressed data! If targeting SSD's, compression will most likely just waste time :lol:
Many console games will keep assets compressed on disk, but will decompress and cache them on the HDD.

This topic is closed to new replies.

Advertisement