Are there any standards to organizing game file and data chunks?

Started by
6 comments, last by _swx_ 10 years, 8 months ago

Hi

I want to how game studios organize their game data and how the game access that data.

For example, a game wants to load a specific model which has an ID. The game has to figure out which file to open. It has to look up a table and figure out what directory the needed file is located. Is there a standard for how these tables work?

Another example; let's say a character Animation needs to blend between 3 animation tracks. The game needs a "Schedule" data structure which holds information on how to blend these tracks and in what file these tracks are located. So the game will probably have a table file that holds all the schedules. Then if the game wants to play an animation, then the game would have to look inside the schedule and find out in which files the animation tracks are.

So, are there any common ways of doing this? Typical header formats etc...

Advertisement

I hardly believe there's anything "typical" about those processes. Back in time it was acceptable (talking about 2001) to just scan the various files at startup for resource names.

The "animation tracks" you describe are typically not in different files: more often than not they go along with the model they animate in the same resource.

I strongly suggest to just buy a game and take a look at its files, archives and maybe have a look at its SDK... it will help.

Previously "Krohm"

Yeah, there's nothing typical across all games studios ;)

I can tell you what I do in my engine tongue.png

There is a lua file (or a collection of them), which list every asset which is available to the game. If some game code wants to load a model, it has to list the filename in one of these lua files first. If a file isn't listed, it isn't compiled.


assets = {}
AddModel( assets, "test" ) -- shorthand for: assets["test.mdl"] = hash("test.mdl")

The game wants to load a test model:


model = load(assets["test.mdl"]);
--this is the same as load(hash("test.mdl"))

i.e. filenames are not used by the file loading system. Filenames are hashed to produce a 32-bit integer name for each file.

When the game starts up, it loads the header of it's data archive file. This header contains a count (number of files in the archive), an array of filename-hashes, and then an array of file information (offset to each file in the archive, and it's size). These arrays are sorted by filename-hash, so they can be binary-searched.
When asked to load a file, the requested filename-hash is found in the array of filename-hashes (using binary search), and the corresponding size/offset is then fetched. These are then used to queue up an asynchronous read of that data in the archive.

The tools then of course need to be able to build this big data archive file wink.png

The content build system then runs though a tonne of rules to figure out what data to build, and how.
It starts with the lua tables above, e.g. it would see that test.mdl is required by the game.

It scans through it's "build rules" and finds one that says:
if you want to produce [foo].mdl, pass [foo].daegeo to a ModelBuilder.
Then it realizes that it needs the test.daegeo file, so it again scans it's rules and finds one that says:
If you want to produce [foo].daegeo or [foo].daemat, pass [foo].dae to a DaeParser.

Now it realizes that it needs test.dae, but there's no rule to produce it, so it assumes it must be a content file.

These rules look like:


local ModelBuilder = Builder("ModelBuilder")
Rule(ModelBuilder, "data/(.*).mdl", "temp/$1.daegeo")

local DaeParser = Builder("DaeParser")
Rule(DaeParser, {"temp/(.*).(daegeo|daemat)", "temp/$1.daegeo", "temp/$1.daemat"}, "$1.dae")


For "content files" like test.dae, they can be arranged in any way that the artists/designers feel like, with one rule -- no duplicate file names, because directories are ignored!

This may seem crazy, strange or stupid to some people... but we've found that using full paths for assets, and allow multiple assets with the same name is often a pitfall during production.
e.g.
* during production, someone moves level1/textures/concrete.png over to common/textures/concrete.png because it's used in multiple levels.
-- if full paths are used, this action breaks level #1, and someone has to go in and replace the old path with the new path.
-- in our system, the file is just called concrete.png no matter where it is stored, so artists/designers can move files around and organize them however/whenever they want.

* during production, someone clones level1/textures/concrete.png over to level2/textures/concrete.png because they want to use it in their level.
-- if full paths are used, everything works, but we end up with two copies of the same file shipping on our final DVD!
-- in our system, the artists are presented with a warning/error, saying that level2/textures/concrete.png is being ignored due to a duplicate file name.

Anyway, getting back to the above model example, the build system now wants to find test.dae to use as an input.

When the build system starts up, it scans the entire content directory tree to build a map of file-names to full paths. It also subscribes to Windows notifications of any changes within the content directory, so it knows when new files are created, files are moved, deleted and modified.
Using this map, it discovers that test.dae is stored at content/test_level/test.dae. It loads this file and passes it to the DaeParser, which outputs temp/test.daemat and temp/test.daegeo. It then loads temp/test.daegeo and passes it to the model builder, which outputs data/test.mdl.
After building all the required files, it then loads up all the files inside the data directory, and writes them into the big archive mentioned before, which is used by the game.

If an artist modifies content/test_level/test.dae at any time, then the content build system receives a notification from windows, and can recompile the test.mdl file automatically.

http://www.bitsquid.se/files/resource_management.html

http://www.altdevblogaday.com/2012/06/04/read-my-lips-no-more-loading-screens/

Don't know if it helps, but my experience is to set something up yourself. With the following remarks:

- like Hodgman said, I also would advise to use relative paths, in my engine I do it relative to the folder where the main executable is located (more or less free of risks)
- make sure you reuse assets where possible, ie save/ load meshes, render instances of the meshes
- take good care of error handling, checking folders, loading files etc

Hope this helps

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

(was writing something... then FireFox crashed... started over...)

but, yeah:

in my case it works mostly like this:

there is a VFS (Virtual FileSystem) which wraps over most of the file-IO.

most code uses a file API similar to that of C stdio + some POSIX like features.

the VFS basically makes all the game contents look like a single unified directory tree.

everything inside the VFS is relative to a single virtual root directory (known as "/").

most of the contents are made visible in this VFS via "mounts", where a particular OS directory or archive is mounted to a location inside the VFS.

on engine start-up, there is an initial script which is loaded which basically tells the engine what directories and similar to mount (basically, where it will find its game data at, ...). this happens prior to most of the rest of the engine bringing itself up.

basically, if you have seen Linux or Cygwin, it is sort of a similar idea...

usually, resource loading works by reading them in as files, then doing whatever decoding/... is needed.

beyond this, data may be either located directly in OS directories (for the most part, this is what I am currently doing), or located inside archive files.

thus far, I have been using formats like ZIP (with a ".pk" extension), PAK (".pak"), and ExWAD (embedded inside DLL or EXE files, may potentially use ".exw" for standalone files).

PAK is a slight variant of the PAK format used by Quake 1/2 and Half-Life (but with a hack to allow escape-coding longer path-names), and ExWAD is loosely descended from the WAD2 format (used by the same games, but is structurally different/incompatible).

typically, the PAK files are currently produced by special bundling tools, which generally take a small collection of files and then encode them into a custom format. typically these tools are fairly special-purpose, such as one tool to deal with sound-effects and another to deal with textures, typically converting the contents of a specific directory-tree and storing them into an output PAK.

as-is, they are basically manually-run batch-tools, but may be later integrated into the build-process, or maybe a more unified packaging tool might eventually be considered.

a partial reason for this packaging+conversion is partly because, it is most convenient to work with files as unpacked directory trees of stuff, and generally working with raw formats like WAV for sounds, PNG for images, ...

however, these formats are not necessarily ideal for storage and distribution, or for loading, so it may make sense to convert them into something more specialized, and this also helps separate the formats used for content creation from those used for loading or distribution (for example, one can quickly change the output format without needing to mess around with the input files, allows for more use of lossy compression formats, ...).

Yeah, there's nothing typical across all games studios ;)

I can tell you what I do in my engine tongue.png

There is a lua file (or a collection of them), which list every asset which is available to the game. If some game code wants to load a model, it has to list the filename in one of these lua files first. If a file isn't listed, it isn't compiled.


assets = {}
AddModel( assets, "test" ) -- shorthand for: assets["test.mdl"] = hash("test.mdl")

The game wants to load a test model:


model = load(assets["test.mdl"]);
--this is the same as load(hash("test.mdl"))

i.e. filenames are not used by the file loading system. Filenames are hashed to produce a 32-bit integer name for each file.

When the game starts up, it loads the header of it's data archive file. This header contains a count (number of files in the archive), an array of filename-hashes, and then an array of file information (offset to each file in the archive, and it's size). These arrays are sorted by filename-hash, so they can be binary-searched.
When asked to load a file, the requested filename-hash is found in the array of filename-hashes (using binary search), and the corresponding size/offset is then fetched. These are then used to queue up an asynchronous read of that data in the archive.

The tools then of course need to be able to build this big data archive file wink.png

The content build system then runs though a tonne of rules to figure out what data to build, and how.
It starts with the lua tables above, e.g. it would see that test.mdl is required by the game.

It scans through it's "build rules" and finds one that says:
if you want to produce [foo].mdl, pass [foo].daegeo to a ModelBuilder.
Then it realizes that it needs the test.daegeo file, so it again scans it's rules and finds one that says:
If you want to produce [foo].daegeo or [foo].daemat, pass [foo].dae to a DaeParser.

Now it realizes that it needs test.dae, but there's no rule to produce it, so it assumes it must be a content file.

These rules look like:


local ModelBuilder = Builder("ModelBuilder")
Rule(ModelBuilder, "data/(.*).mdl", "temp/$1.daegeo")

local DaeParser = Builder("DaeParser")
Rule(DaeParser, {"temp/(.*).(daegeo|daemat)", "temp/$1.daegeo", "temp/$1.daemat"}, "$1.dae")


For "content files" like test.dae, they can be arranged in any way that the artists/designers feel like, with one rule -- no duplicate file names, because directories are ignored!

This may seem crazy, strange or stupid to some people... but we've found that using full paths for assets, and allow multiple assets with the same name is often a pitfall during production.
e.g.
* during production, someone moves level1/textures/concrete.png over to common/textures/concrete.png because it's used in multiple levels.
-- if full paths are used, this action breaks level #1, and someone has to go in and replace the old path with the new path.
-- in our system, the file is just called concrete.png no matter where it is stored, so artists/designers can move files around and organize them however/whenever they want.

* during production, someone clones level1/textures/concrete.png over to level2/textures/concrete.png because they want to use it in their level.
-- if full paths are used, everything works, but we end up with two copies of the same file shipping on our final DVD!
-- in our system, the artists are presented with a warning/error, saying that level2/textures/concrete.png is being ignored due to a duplicate file name.

Anyway, getting back to the above model example, the build system now wants to find test.dae to use as an input.

When the build system starts up, it scans the entire content directory tree to build a map of file-names to full paths. It also subscribes to Windows notifications of any changes within the content directory, so it knows when new files are created, files are moved, deleted and modified.
Using this map, it discovers that test.dae is stored at content/test_level/test.dae. It loads this file and passes it to the DaeParser, which outputs temp/test.daemat and temp/test.daegeo. It then loads temp/test.daegeo and passes it to the model builder, which outputs data/test.mdl.
After building all the required files, it then loads up all the files inside the data directory, and writes them into the big archive mentioned before, which is used by the game.

If an artist modifies content/test_level/test.dae at any time, then the content build system receives a notification from windows, and can recompile the test.mdl file automatically.

Hi Hodgman, I'm currently writing my builder in a similar way using python, really helpful! Was just wondering what kind of intermediate format you use to export your scene information from a program like maya, for example exporting a scene graph or just the object positions?

I use json for pretty much all my resource files except textures, models and other very large resource types. This allows my resource compiler to automatically find dependencies between the resources.

This topic is closed to new replies.

Advertisement