Jump to content
  • Advertisement
Sign in to follow this  
Mizmoon

Using strings as dictionary keys

This topic is 432 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello! I'm working on a hobby game project where i have a static resource manager class which loads and maps all game resources into dictionaries, using their mod name (the game is moddable) and file name. For instance, for all meshes, you might have an entry like this:

meshes[@"core/MyDebugMesh"] = LoadMesh(path);

Where "core" is the default mod name of internal files. The string "core/MyDebugMesh" then occurs inside JSON files under the field "mesh" (or similar). I like this format because it is easy to work with and has acceptable guarantees in terms of key uniqueness (two files of the same type can't occupy the same path). The system automatically navigates through specified folders and figures out what to do with the files there.

However, i read in the MSDN documentation that two different strings can have the same hash code, and also that the hash function produces different results on 32-bit and 64-bit machines. Is this something i need to be concerned about? Aside from this, are there any implications on performance using my solution?

Share this post


Link to post
Share on other sites
Advertisement

You don't need to worry about string hash collisions screwing up equality unless you explicitly tell the Dictionary class to use an IEqualityComparer that cares only about hash value. The default Comparer will behave sanely with string keys.

Share this post


Link to post
Share on other sites

Adding to what SiCrane said...

The hash code is only used for coarse sorting... if there is a hash collision there is still a true comparison used to sort/place the value. But this is helpful as a hash compare (comparing two integers) is faster then a string compare... so this way it only needs to do the expensive compare when there is a hash collision rather than not being able to take advantage of the faster hash comparison at all.

Share this post


Link to post
Share on other sites
The internal workings of Dictionary are fast, but it will still need to do at least one .Equals call per lookup (it needs to ensure that the key you're asking for is in fact the one in the dictionary, even if there is only one string with that hash in the dictionary).

If there are hash collisions, Dictionary has an index-linked-list between all entries with the same hash, and will call .Equals until it finds a match.

string.Equals does the following checks for speed:
- ReferenceEquals (i.e. if the memory address of the string arguments are identical then it knows the strings are equal)
- Length check (it's highly likely for unequal strings to also be unequal in length so it does this before comparing the characters in the hopes that it will fail fast)
- A fast fixed-pointer string comparison.


If you're particularly worried, you can Intern the string values that you use to access the dictionary, which will cause them to all become ReferenceEquals to each other. NOTE: You have to do this BEFORE adding the key to the dictionary, and do it a single time for all of the strings in the other data files. The cost of Interning is worse than a single dictionary lookup, so if you repeatedly intern the same strings over and over it will be WORSE performance. I wouldn't bother with this approach unless you're absolutely sure you need it. It should be fast enough unless you're doing something crazy.

Another tip:

Always use TryGetValue instead of calling ContainsKey and then dictionary[key]. ContainsKey needs to do all of the same checks that TryGetValue does, EXCEPT it doesn't return you the value, then dictionary[key] does the same work a second time. Edited by Nypyren

Share this post


Link to post
Share on other sites

That's a great response Nypyren!! I didn't know about those early-exits in string.Equals, that's very interesting. Doesn't help much when the value will (ideally) always exist, though.

Another quick question regarding hashing: let's say that i have an object where i'm not interested in knowing whether the objects are actually equal other than by reference (imagine hashing the bricks of a building - they are all identical, you just want to know which is which). Should i override GetHashCode() to only consider the reference (with some xor to distribute it), or is this an anti-pattern? Am i more likely to end up with a worse distribution than the default implementation?

Share this post


Link to post
Share on other sites

Another quick question regarding hashing: let's say that i have an object where i'm not interested in knowing whether the objects are actually equal other than by reference (imagine hashing the bricks of a building - they are all identical, you just want to know which is which). Should i override GetHashCode() to only consider the reference (with some xor to distribute it), or is this an anti-pattern? Am i more likely to end up with a worse distribution than the default implementation?


If you *don't* override Equals or GetHashCode, they will both use the reference only, and it works great!

I usually design my code this way since it's less of a hassle than making custom Equals and GetHashCode everywhere. Edited by Nypyren

Share this post


Link to post
Share on other sites

Thanks! I knew this was the case for Equals, just wasn't sure about GetHashCode. Good to know!

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!