• Advertisement
Sign in to follow this  

Using strings as dictionary keys

Recommended Posts

Hello! I'm working on a hobby game project where i have a static resource manager class which loads and maps all game resources into dictionaries, using their mod name (the game is moddable) and file name. For instance, for all meshes, you might have an entry like this:

meshes[@"core/MyDebugMesh"] = LoadMesh(path);

Where "core" is the default mod name of internal files. The string "core/MyDebugMesh" then occurs inside JSON files under the field "mesh" (or similar). I like this format because it is easy to work with and has acceptable guarantees in terms of key uniqueness (two files of the same type can't occupy the same path). The system automatically navigates through specified folders and figures out what to do with the files there.

However, i read in the MSDN documentation that two different strings can have the same hash code, and also that the hash function produces different results on 32-bit and 64-bit machines. Is this something i need to be concerned about? Aside from this, are there any implications on performance using my solution?

Share this post


Link to post
Share on other sites
Advertisement

You don't need to worry about string hash collisions screwing up equality unless you explicitly tell the Dictionary class to use an IEqualityComparer that cares only about hash value. The default Comparer will behave sanely with string keys.

Share this post


Link to post
Share on other sites

Adding to what SiCrane said...

The hash code is only used for coarse sorting... if there is a hash collision there is still a true comparison used to sort/place the value. But this is helpful as a hash compare (comparing two integers) is faster then a string compare... so this way it only needs to do the expensive compare when there is a hash collision rather than not being able to take advantage of the faster hash comparison at all.

Share this post


Link to post
Share on other sites
The internal workings of Dictionary are fast, but it will still need to do at least one .Equals call per lookup (it needs to ensure that the key you're asking for is in fact the one in the dictionary, even if there is only one string with that hash in the dictionary).

If there are hash collisions, Dictionary has an index-linked-list between all entries with the same hash, and will call .Equals until it finds a match.

string.Equals does the following checks for speed:
- ReferenceEquals (i.e. if the memory address of the string arguments are identical then it knows the strings are equal)
- Length check (it's highly likely for unequal strings to also be unequal in length so it does this before comparing the characters in the hopes that it will fail fast)
- A fast fixed-pointer string comparison.


If you're particularly worried, you can Intern the string values that you use to access the dictionary, which will cause them to all become ReferenceEquals to each other. NOTE: You have to do this BEFORE adding the key to the dictionary, and do it a single time for all of the strings in the other data files. The cost of Interning is worse than a single dictionary lookup, so if you repeatedly intern the same strings over and over it will be WORSE performance. I wouldn't bother with this approach unless you're absolutely sure you need it. It should be fast enough unless you're doing something crazy.

Another tip:

Always use TryGetValue instead of calling ContainsKey and then dictionary[key]. ContainsKey needs to do all of the same checks that TryGetValue does, EXCEPT it doesn't return you the value, then dictionary[key] does the same work a second time. Edited by Nypyren

Share this post


Link to post
Share on other sites

That's a great response Nypyren!! I didn't know about those early-exits in string.Equals, that's very interesting. Doesn't help much when the value will (ideally) always exist, though.

Another quick question regarding hashing: let's say that i have an object where i'm not interested in knowing whether the objects are actually equal other than by reference (imagine hashing the bricks of a building - they are all identical, you just want to know which is which). Should i override GetHashCode() to only consider the reference (with some xor to distribute it), or is this an anti-pattern? Am i more likely to end up with a worse distribution than the default implementation?

Share this post


Link to post
Share on other sites

Another quick question regarding hashing: let's say that i have an object where i'm not interested in knowing whether the objects are actually equal other than by reference (imagine hashing the bricks of a building - they are all identical, you just want to know which is which). Should i override GetHashCode() to only consider the reference (with some xor to distribute it), or is this an anti-pattern? Am i more likely to end up with a worse distribution than the default implementation?


If you *don't* override Equals or GetHashCode, they will both use the reference only, and it works great!

I usually design my code this way since it's less of a hassle than making custom Equals and GetHashCode everywhere. Edited by Nypyren

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Advertisement