Help with String Hashing

Started by
14 comments, last by dougbinks 11 years, 5 months ago
That trick of only looking at a maximum of 10 characters seems like a cheat to do better at some benchmark, but it's kind of dangerous for a general string hash. Note that IMG_0001.JPG, IMG_0002.JPG etc. will all collide. You may also think of using a hash to check if a file has changed, but that will fail all the time.

I have used CRC32 in the past as a string hash, and it works great. I think it was suggested in "The Art of Computer Programming".
Advertisement

std::hash will skip characters of the string if the string is longer then 10 characters, at least the VS2010 implementation does this.

Definitely not a feature of std::hash in general. This was fixed in MSVC 2012.

[quote name='NightCreature83' timestamp='1353923257' post='5004163']
std::hash will skip characters of the string if the string is longer then 10 characters, at least the VS2010 implementation does this.

Definitely not a feature of std::hash in general. This was fixed in MSVC 2012.
[/quote]
Yeah I saw they fixed it, it's also a FNV1a hash function in VS2012 so there is no reason to write your own anymore.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

You might still want to write your own if you're supporting pre C++11 compilers, or if you need to ensure that your hashing algorithm exactly matches some external specification or data-set.

I am trying to implement a user-friendly way to label and identify my in-game assets and entities. I have been using an enumeration system where each class of entity or asset will have an enumeration entry, making the program easier to understand. For me, this works. However, I have read about string hashing and how this can be used to give designers and level editors (i.e. myself in the future) the ability to search for and create entities and assets using strings, but still maintains the run-time performance of integer identification.

If by "enumeration" you mean code like

...
STANDING_SOLDIER_SPRITE= "soldier1.png",
CROUCHING_SOLDIER_SPRITE= "soldier2.png",
...

or (with objects)

...
/*spritesheet, frame count, mode*/
STANDING_SOLDIER_SPRITE= AnimatedSprite("soldier1.png",5,AnimatedSprite.PING_PONG),
CROUCHING_SOLDIER_SPRITE= AnimatedSprite("soldier2.png",3,AnimatedSprite.ONCE_FORWARD),
...

the inflexibility in case you want to edit levels and make mods is obvious; you need to move almost all references to identifiers of any kind (symbols in code, strings, arbitrary numbers, file positions, etc.) from code to data files.

But in a level editor and in data files, human-friendly strings are an optional attribute of assets. When you create an asset, you can give it a unique ID (usually a short string), used to reference the asset from game levels and other assets, and prompt the user to enter a friendly name and possibly tags and arbitrary attributes (which will only be used in debug output and in the level editor).

Hashing has no place in generating unique asset IDs: in a mod-oriented architecture, with multiple archives partly replacing each other's content, you need to use the "standard" names without being clever (given a set of archives, tools can index asset IDs to tell the user what assets are being intentionally or unintentionally shadowed by another with the same ID, whether a referenced asset ID is missing, and whether the added IDs of added assets are actually unique as they should); if instead everything is compiled and consolidated, you have a finite and closed set of asset IDs in use and if you want integers for efficiency reasons you can simply number them in any arbitrary order.

Omae Wa Mou Shindeiru

You may want to check out: http://www.altdevblogaday.com/2011/10/27/quasi-compile-time-string-hashing/ for quasi compile time hashing and some further information. Also the murmur hash from Bitsquid's open source foundation may be useful https://bitbucket.org/bitsquid/foundation/src (also see the blog about this here: http://www.altdevblogaday.com/2012/11/01/bitsquid-foundation-library/ ).

This topic is closed to new replies.

Advertisement