is there a better way top refer to assets in a game?

Started by
32 comments, last by Norman Barrows 9 years, 4 months ago

is there a better way top refer to assets in a game?

something like:

playwav("some_sound.wav");

is high on readability, but runs slow.

whereas something like:

playwav(36); // wav ID # 36 = some_sound.wav

runs fast, but is low on readability (without the comments), and is worse from a write-ability standpoint, because you have to look up the wav ID # for some_sound.wav in order to code it in.

one possibility is to #define some mnemonic name for each asset, such as:

#define some_sound_wav 36

or possibly use enums, perhaps one enumerated type for each type of asset. but enums would get evaluated at runtime, not compile time, right? and they have type check overhead too, correct?

so another possibility is you make the whole thing data driven, with internal or external editors. in the editors, assets are referred to by filename (for example), but are saved in the data files as ID numbers. but this would not handle non-data driven code such as:

ZeroMemory(drawinfo)

drawinfo.mesh=SPHERE

drawinfo.tex=GRANITE1

drawinfo.scale=...

drawinfo.location=...

drawinfo.flags=...

draw_it(drawinfo)

right now, i use ID numbers, so the above code might look like:

drawinfo.mesh=3;

drawinfo.tex=27;

for example.
but using ID numbers means i have to look up ID numbers for assets all the time. and if i don't add comments, from a readability and maintainability standpoint, the ID number is as mysterious as a "magic number". in fact, it basically IS a "magic number" - some magic number representing some asset.
so, is there a way to get the best of both worlds?
fast runtime like one gets with ID numbers, but ease of readability and write-ability like file names or name strings of some sort, preferably without having to do something extreme like #defining the ID number for every asset, or making every draw call data driven from files requiring a custom editor?
what about some sort of fixup or conversion at load time? but i don't see how that's possible...
some sort of macro processor that converts asset string names to ID numbers, and you run the code through the macro processor before compiling? should work - kinda extreme though....

has somebody already figured out a good solution to this, and i've just never heard of it?

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Advertisement


playwav("some_sound.wav");

is high on readability, but runs slow.

Do you have evidence to back up that claim? How is passing a string to a sound playing function noticeably slower than passing an integer?

Are you trying to say that loading the sound asset from disk by filename is slow when compared to pulling the sound asset from a cache by its integer id? If so, then yes that would be noticeably slower.

You can get the best of both worlds however by using an associative array (or dictionary) to allow you to look up the assets by some unique string... such as the filename.

[size="2"]Currently working on an open world survival RPG - For info check out my Development blog:[size="2"] ByteWrangler


You can get the best of both worlds however by using an associative array (or dictionary) to allow you to look up the assets by some unique string... such as the filename.

This would still require iteration, which could depending on the size of the associative array become a slow operation, at least compared to some other alternatives.

I would suggest using a hash table consisting of hashed strings as values and strings as keys, which would allow a search time of O(1), instead of the inconsistent 0(n) that you would get with an associative array.

You could read more about it here http://molecularmusings.wordpress.com/2011/06/24/hashed-strings/

You can get the best of both worlds however by using an associative array (or dictionary) to allow you to look up the assets by some unique string... such as the filename.


This would still require iteration, which could depending on the size of the associative array become a slow operation, at least compared to some other alternatives.
I would suggest using a hash table consisting of hashed strings as values and strings as keys, which would allow a search time of O(1), instead of the inconsistent 0(n) that you would get with an associative array.

You could read more about it here http://molecularmusings.wordpress.com/2011/06/24/hashed-strings/

Which is probably why he mentioned dictionary (in the parenthesis), as it's typically a hash based container and thus amortized O(1) average time.

Additionally, there's no reason you cannot preprocess this kind of information to generate a unique ID per asset which also happens to correspond to an integer, and has a nice unique name.

or possibly use enums, perhaps one enumerated type for each type of asset. but enums would get evaluated at runtime, not compile time, right? and they have type check overhead too, correct?

No, enumeration values are NOT generated at runtime. They're entirely compile time. The difference between these two is basically nothing:

enum class AudioAsset : uint32_t {
    SomeSound = 36,
};

void PlaySound(AudioAsset asset);

PlaySound(AudioAsset::SomeSound);
PlaySound((AudioAsset)36);
and
void PlaySound(uint32_t asset);
PlaySound(36);

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

I just hash the strings into a 32bit integer. This lets you get rid of all your strings ahead of time, and just use ints everywhere. If for some reason you're stuck with a string at runtime, you can just quickly hash it at runtime too.
n.b. it's possible to implement string hashing at compile time with some macro magic, so you can use strings in your source, which get compiled and used at runtime as ints.

Yeah, you can get hash collisions, where two different strings map to the same int. In practice, I've waiting for it to happen to me yet (the data build tools check for this case offline), and when it does I'll just increment the seed used by the hash until it doesn't happen.

ATM, I'm using the FNV32a hash, which is super cheap if required at runtime, easy to create a compile-time version, and seems to give good distribution:


u32 hash(const char* str, u32 seed)
{
    u32 h = seed;
    const u8* s = (u8*)str;
    while (*s) {
        h ^= *s++;
        h *= 0x01000193;//very specific magic prime number for 32bit hashing
    }
    return h;
}

When loading files -
For retail builds: all the assets get packed into an archive with 32-bit filenames. They're no longer using textual filenames at all. The archive has a lookup from a 32-bit file-name-hash to an offset/size within the archive.
For development builds: Assets are kept as loose files on disk. The data build tool produces a dictionary for converting from a 32-bit file-name-hash to a windows file name.

[edit]
In any case, you shouldn't have code like playwav("some_sound.wav"); though -- more like:
Sound* some_sound = loadwav("some_sound.wav"); //filename processing paid once, pointer obtained
...
playwav(some_sound); // no details of filesystem involved per frame

[edit]
In any case, you shouldn't have code like playwav("some_sound.wav"); though -- more like:
Sound* some_sound = loadwav("some_sound.wav"); //filename processing paid once, pointer obtained
...
playwav(some_sound); // no details of filesystem involved per frame



This.
1000x this.

Load sound once; obtain handle to sound; play via handle either playwav(handle) or soundObj->play().

At BEST I might have a playprecached("some_sound"); which would do the name hashing and lookup from the resource container there and then but still no disk I/O at that point.

so another possibility is you make the whole thing data driven, with internal or external editors. in the editors, assets are referred to by filename (for example), but are saved in the data files as ID numbers.

This. Very rarely should you refer to specific assets in code. All that your code should be doing is routing the asset references from data to the appropriate subsystem at the right time, i.e.:


void Player::handleDamaged()
{
   // ... other stuff ...
   audio->playSound(getDamagedSound());
   // ... more stuff ...
}

In which case it doesn't matter whether that reference is an integer, string, handle/pointer, etc., the code is still readable and entirely data-driven.

but this would not handle non-data driven code such as:

ZeroMemory(drawinfo)
drawinfo.mesh=SPHERE
drawinfo.tex=GRANITE1
drawinfo.scale=...
drawinfo.location=...
drawinfo.flags=...
draw_it(drawinfo)

right now, i use ID numbers, so the above code might look like:
drawinfo.mesh=3;
drawinfo.tex=27;
for example.

but using ID numbers means i have to look up ID numbers for assets all the time. and if i don't add comments, from a readability and maintainability standpoint, the ID number is as mysterious as a "magic number". in fact, it basically IS a "magic number" - some magic number representing some asset.

In those special cases where you need to refer to a specific asset, you should use an enumeration. They're highly readable, become simple integers at compile-time, and if you use enum classes, entirely type-safe (so you can't accidentally specify a sound asset when a model asset is expected, etc.). Under the hood, a factory would map those enumerations to an actual asset (which can also be made data-driven). Ideally your enumeration names would reflect the context and purpose of the asset, as opposed to what the asset actually is, to make them even easier to maintain. So instead of:

drawinfo.mesh = SPHERE;

drawinfo.tex = GRANITE1;

You'd have:

drawinfo.mesh = SpecialMeshes::THE_MESH_USED_FOR_GRANITE;

drawinfo.tex = SpecialTextures::THE_TEXTURE_USED_FOR_GRANITE;

So in the future, if granite ever changes from a sphere into a triangle, no one has to go through and rename all those references. They just update the factory and all the code still works and makes sense.

If you have a reasonably modern C++ compiler, you can use a constexpr function to hash the strings (and if you can be bothered, use a user-defined literal). That's more readable than most other solutions (like, macros) and less trouble than maintaining defines or enums. Collisions can happen, but do not normally occur, unless you have more than 50,000 or 100,000 resources (then you should be using 64 bits instead of 32 and you're on the safe side again).

Then again, coding resource names into the game executable is somewhat inelegant.

I'm packing my resources as a build step according to what some XML file tells the pack script to include (most projects with more than 5-10 assets use a variation of that method). The resource packer translates names to sequential numbers (incrementing an integer and maintaining a map while it runs, no collisions possible) and refers to other resources within the binary by these numbers only. No hash in the executable at all, and no occurrence of any names or magic numbers in the program. Well, almost, that is.

Ideally, your program will do something like LoadLevel(config->saved_level_id()); or even LoadLevel(1);. Then, once that level is loaded, you have the IDs of the meshes and textures etc which this level uses (they're stored as part of the level in the binary resource file). No need for the program to know any of these at all. In fact, there isn't just "no need", but even "no desire". Your program couldn't care less whether it has to load rock.mesh or tree.mesh, and it probably shouldn't even know what either of them is, or where it occurs (the datafile will tell you just fine).

I've been tempted for a while to replace the sequential IDs with offsets into the datafile. The advantage would be that you could directly access an asset by its name (map the file, add the ID to the base address, and read that location) and the identifiers are also guaranteed to be collision-free. There would not need to be an extra indirection via a lookup table either.

I am however not sure inhowfar that creates a maintenance nightmare when content is later added, though. Probably it will "just work" since the resource packer generates a completely new archive anyway, but no evidence from practical experience here (the "will just work" assumption might not be true).

Also, it's arguably not as robust since any number within the mapped range would be a "valid" ID whereas in a lookup table you can verify that an ID doesn't exist (But... does that even matter? You should be able to rely 100% that your resource packer doesn't put garbage IDs into your binary files, or you're in trouble anyway!).

ok, a bit more detail as to my situation:

i have libraries (Spiro) / databases (barrows) / pools (Adams) of shared resources: meshes, textures, models, animations, and wavs.

they are implemented as static arrays.

assets are referred to by array index:

drawinfo.meshID=3;

drawinfo.texID=27;

playwav(7);

aniplayerID=start_ani(3,50); // model=3 animation=50

and so on...

load order defines what filename (and therefore asset) is associated with which array index.

all assets are loaded at game start, so which asset is which index never changes. assets and file names may change during development, but the index associated with a particular asset (sphere mesh, granite texture, etc) stays the same.

what i want to be able to do is refer to an asset with something more human readable than an array index.

something like: playwav(gigantopithicus_attack_wav);

instead of: playwav(126); // 126= gigantopithicus_attack.wav


The resource packer translates names to sequential numbers

so the resources have human readable names in a level editor of some sort, which get translated into ID #'s for use by the binary?

so that would be an example of the data driven with editor approach, requiring data driven code, and an editor, right?

it appears that one must go data driven and bind human readable to id number in an editor, or use human readable enums for non-data driven code.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

At the very least, an enum would allow you to change "126" into "gigantopithicus_attack_wav" without causing any other ripple effects in your code, while also being human-readable.

so the resources have human readable names in a level editor of some sort, which get translated into ID #'s for use by the binary?

so that would be an example of the data driven with editor approach, requiring data driven code, and an editor, right?

it appears that one must go data driven and bind human readable to id number in an editor, or use human readable enums for non-data driven code.

There's no reason why you couldn't use human-readable strings everywhere, from content creation to run-time. IDs are just an optimization. The representation isn't really the issue when it comes to data-driven code. It's more about being able to adapt to changes in data without requiring changes or maintenance to code. So instead of hard-coding the call to play a particular sound:

playwav(126); // 126 = gigantopithicus_attack.wav

The object understands the context in which the sound is being played, and can use that to reference the correct piece of data:

playway(getAttackSound());

If you then set up a data file that mapped the attack sound context, to the attack sound asset:

{"gigantopithicus" : {"attackSound" : "gigantopithicus_attack.wav"}}

Then updating the data would be sufficient to change the sound, without requiring any changes to code.

This topic is closed to new replies.

Advertisement