• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
KingofNoobs

Help with String Hashing

15 posts in this topic

Hello,

I am trying to implement a user-friendly way to label and identify my in-game assets and entities. I have been using an enumeration system where each class of entity or asset will have an enumeration entry, making the program easier to understand. For me, this works. However, I have read about string hashing and how this can be used to give designers and level editors (i.e. myself in the future) the ability to search for and create entities and assets using strings, but still maintains the run-time performance of integer identification.

However, I have not yet been able to find a suitable example of how (and when) hashing could be used in a video-game case to provide this functionality. I assume that the hashing would have to be done at compile time to make the system as fast as integer identification, but then, how would that enable the level editor and other offline tools to use strings?

If someone could provide a concise example of how string hashing works with regard to asset and entity identification I would be greatly appreciative.

Thank you as always.

- Dave Ottley
1

Share this post


Link to post
Share on other sites
You probably don't want a 'real' hash function for this - just associate each string with an integer. For example you could start at zero increment for each string added, putting them as key and value in an std::map. Things get more complicated if you want to remove strings and reuse integers, but you probably won't need to.

Edit: turns out this is called string interning. Thanks Hodgman!
1

Share this post


Link to post
Share on other sites
Hodgeman and MrBastard,

Is there any advantage to a hash function over interning? It would seem that interning is always faster. Is it just the memory usage for the string container?

Best,

Dave
0

Share this post


Link to post
Share on other sites
For when to use hashing, you should use it when you'd need to otherwise do lots of string comparisons. What you do is when you get a string (say an object name) you hash it and store it with the object (in addition to the name, usually). Then, when you need to look up an object by a string name, you can instead hash what you're searching for, then compare the hashes with a single integer comparison. You'll probably get collisions, so once you find a matching hash [i]then[/i] you do a string comparison to make sure you have the right one.

In the case where you have 10,000 objects with an average name length of lets say 9 characters, and the worst case scenario of the object you're after being the last one, you could do either an average of 100,000 char-comparisons (including null-terminators at the end of the string) or you could do 10,000 int-comparisons and maybe 1000 char comparisons. ~11,000 < ~100,000, so I know what I'd choose.

The hashing function I've been considering using is the [url="http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function"]Fowler–Noll–Vo[/url] 1a variant, though you can look at some others to see what might best fit your needs.

[CODE]
uint32_t getHash(const char *str)
{
// Implementation of the Fowler–Noll–Vo-1a hash function
uint32_t hash = 0x811C9DC5; // Actually I use a defined constant here, but this is the seed for a 32-bit hash
for(size_t i = 0; str[i]; ++i)
{
// The FNV-1a variation
hash ^= str[i];
hash *= 0x01000193; // Same for this, this is the 32-bit prime number
}
// Wow that was hard.
return hash;
}
[/CODE]
(excuse the lack of indentation, HTML eats "extraneous" whitespace).

I treat it basically like a one-function library, I don't try to understand how it works as long as it does what it says it should. There's a lot of math behind the structure and numbers used and I write games, not hashing functions.
0

Share this post


Link to post
Share on other sites
I've used the Dan Bernstein hash for stuff like this in the past, but as pointed out, the number of bits you use will dictate how frequent a collision occurs. If you get too many collisions you lose the O(1) access time and revert to O(n) plus the cost of doing the hash function.

If you use STL, you could try something like this:

std::map< std::string, DWORD > identMap;

This will essentially hash a std::string into a DWORD and vice versa. Edited by Steve_Segreto
0

Share this post


Link to post
Share on other sites
Is there any reason not to use std::hash<std::string> ? Is it slower than the other hash functions mentioned above? Edited by Nanook
0

Share this post


Link to post
Share on other sites
[quote name='Nanook' timestamp='1353917336' post='5004144']
Is there any reason not to use std::hash<std::string> ? Is it slower than the other hash functions mentioned above?
[/quote]
std::hash will skip characters of the string if the string is longer then 10 characters, at least the VS2010 implementation does this. Other then that it won't be any slower or faster then your own implementation, the actual implementation looks like FNV anyway.

[code]
template<>
class hash<_STD string>
: public unary_function<_STD string, size_t>
{ // hash functor
public:
typedef _STD string _Kty;
size_t operator()(const _Kty& _Keyval) const
{ // hash _Keyval to size_t value by pseudorandomizing transform
size_t _Val = 2166136261U;
size_t _First = 0;
size_t _Last = _Keyval.size();
size_t _Stride = 1 + _Last / 10;
for(; _First < _Last; _First += _Stride)
_Val = 16777619U * _Val ^ (size_t)_Keyval[_First];
return (_Val);
}
};
[/code]
0

Share this post


Link to post
Share on other sites
That trick of only looking at a maximum of 10 characters seems like a cheat to do better at some benchmark, but it's kind of dangerous for a general string hash. Note that IMG_0001.JPG, IMG_0002.JPG etc. will all collide. You may also think of using a hash to check if a file has changed, but that will fail all the time.

I have used CRC32 in the past as a string hash, and it works great. I think it was suggested in "The Art of Computer Programming". Edited by Álvaro
0

Share this post


Link to post
Share on other sites
[quote name='NightCreature83' timestamp='1353923257' post='5004163']
std::hash will skip characters of the string if the string is longer then 10 characters, at least the VS2010 implementation does this.
[/quote]
Definitely not a feature of std::hash in general. This was fixed in MSVC 2012.
0

Share this post


Link to post
Share on other sites
[quote name='SiCrane' timestamp='1353943466' post='5004208']
[quote name='NightCreature83' timestamp='1353923257' post='5004163']
std::hash will skip characters of the string if the string is longer then 10 characters, at least the VS2010 implementation does this.
[/quote]
Definitely not a feature of std::hash in general. This was fixed in MSVC 2012.
[/quote]
Yeah I saw they fixed it, it's also a FNV1a hash function in VS2012 so there is no reason to write your own anymore.
0

Share this post


Link to post
Share on other sites
[quote name='KingofNoobs' timestamp='1353585283' post='5003207']
I am trying to implement a user-friendly way to label and identify my in-game assets and entities. I have been using an enumeration system where each class of entity or asset will have an enumeration entry, making the program easier to understand. For me, this works. However, I have read about string hashing and how this can be used to give designers and level editors (i.e. myself in the future) the ability to search for and create entities and assets using strings, but still maintains the run-time performance of integer identification.
[/quote]
If by "enumeration" you mean code like
[CODE]
...
STANDING_SOLDIER_SPRITE= "soldier1.png",
CROUCHING_SOLDIER_SPRITE= "soldier2.png",
...
[/CODE]
or (with objects)
[CODE]
...
/*spritesheet, frame count, mode*/
STANDING_SOLDIER_SPRITE= AnimatedSprite("soldier1.png",5,AnimatedSprite.PING_PONG),
CROUCHING_SOLDIER_SPRITE= AnimatedSprite("soldier2.png",3,AnimatedSprite.ONCE_FORWARD),
...
[/CODE]
the inflexibility in case you want to edit levels and make mods is obvious; you need to move almost all references to identifiers [i]of any kind [/i](symbols in code, strings, arbitrary numbers, file positions, etc.) from code to data files.

But in a level editor and in data files, human-friendly strings are an optional attribute of assets. When you create an asset, you can give it a unique ID (usually a short string), used to reference the asset from game levels and other assets, and prompt the user to enter a friendly name and possibly tags and arbitrary attributes (which will only be used in debug output and in the level editor).

Hashing has no place in generating unique asset IDs: in a mod-oriented architecture, with multiple archives partly replacing each other's content, you need to use the "standard" names without being clever (given a set of archives, tools can index asset IDs to tell the user what assets are being intentionally or unintentionally shadowed by another with the same ID, whether a referenced asset ID is missing, and whether the added IDs of added assets are actually unique as they should); if instead everything is compiled and consolidated, you have a finite and closed set of asset IDs in use and if you want integers for efficiency reasons you can simply number them in any arbitrary order.
0

Share this post


Link to post
Share on other sites
You may want to check out: http://www.altdevblogaday.com/2011/10/27/quasi-compile-time-string-hashing/ for quasi compile time hashing and some further information. Also the murmur hash from Bitsquid's open source foundation may be useful https://bitbucket.org/bitsquid/foundation/src (also see the blog about this here: http://www.altdevblogaday.com/2012/11/01/bitsquid-foundation-library/ ). Edited by dougbinks
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0