Jump to content
  • Advertisement
Sign in to follow this  
choffstein

hash code for vector of integers

This topic is 4707 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a vector of integers that I am trying to store in a std::map. After the first element, all of the elements are sorted (though, there are many repeats). An example vector is: 47,6,6,6,7,7,8,8,9,9 Currently, I am using a stringstream, reading in all of the ints, and exporting a string. However, this function runs A LOT. My profiler is fscked, so I can't tell if it is causing a bottleneck, but I was wondering if anyone can come up with any other hash code ideas for a vector of ints. Hash codes should be equal when one vector equals another. I was thinking of passing around char * instead of std::string, but then I realized that the char * should become invalid the second the string stream goes out of scope, so I don't think that is an option (unless the ss.str().c_str() would return a copy...) Thanks.

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by visage
Hash codes should be equal when one vector equals another.


This is impossible to guarantee without some kind of special knowledge about the data. Hash codes are usually fixed in length so obviously if your data can have more combinations than the hash code then the possibility of collisions must exist. I can't really see why you couldn't just put the vector into the map (although this might hit your performance because it would need to be copied once for every insertion operation). If you're just looking for a way to shrink the data some kind of run length encoding looks like it would be appropriate here.

Share this post


Link to post
Share on other sites
You need to use std::multimap if you plan on having a key being used more than once. Also, do you have a value you are storing in the map along with the key? If not maybe std::set is better for your needs.

I'm not very clear on exactly what you're doing to comment on the speed question.

Share this post


Link to post
Share on other sites
I have a class called "Configuration" which holds the vector. I cant just insert the vector, because Configuration has other data in it (parent configurations). However, configuration equality is based on the vector's data.

So I am passing in a hash based on the vector and a pointer to the configuration as the data.

Obviously I can't use a set on the pointer to the configuration (without some sort of comparator...)

Quote:

This is impossible to guarantee without some kind of special knowledge about the data.

I don't know why you say this. This is the exact specification that Java has (when obj.equals(obj2), obj.hashCode() = obj2.hashCode) -- I am just bringing it to my implementation.

I don't want to pass in the vector because this is a subclass of a configuration type. Other configuration sub classes don't use vectors, but are included in the same map. Thus, I have to figure out hashing for each type.

Though, I suppose the other configurations could easily be changed to vectors...

Thanks so far.

Share this post


Link to post
Share on other sites
Quote:
Original post by visage
Quote:

This is impossible to guarantee without some kind of special knowledge about the data.

I don't know why you say this. This is the exact specification that Java has (when obj.equals(obj2), obj.hashCode() = obj2.hashCode) -- I am just bringing it to my implementation.


Good point - sorry, not reading that carefully enough (I was thinking of the reverse implication which of course isn't true).

Share this post


Link to post
Share on other sites
So let me get this right. You're currently calculating a string from the vector and using that as the hash?

If so, that's quite a shocker for performance really. You've got memory allocation/deallocation of the strings. Resizing of strings as you build the hash. Presumably long strings to compare...

Why not calcluate the CRC of the integers in the vector, and use the result as an integer hash?

Share this post


Link to post
Share on other sites
Well, I recognize that iMalc -- hence why I asked ;)
In truth, the hash distribution is more important. It must be absolutely unique for each vector that is not equal (which is almost impossible to ask without doing the method I am using).

Ill try the CRC method. Thanks.

Share this post


Link to post
Share on other sites
CRC hash didn't work so well. I had to force my vector of ints into a vector of chars, then pass that along to the crc, which didn't do so great a job.

Maybe a fast function with a multimap?

Share this post


Link to post
Share on other sites
Why are you using hashes together with std::map or std::multimap? Lexicographic comparison always is faster than hashes...

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!