Sign in to follow this  

hash code for vector of integers

This topic is 4336 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a vector of integers that I am trying to store in a std::map. After the first element, all of the elements are sorted (though, there are many repeats). An example vector is: 47,6,6,6,7,7,8,8,9,9 Currently, I am using a stringstream, reading in all of the ints, and exporting a string. However, this function runs A LOT. My profiler is fscked, so I can't tell if it is causing a bottleneck, but I was wondering if anyone can come up with any other hash code ideas for a vector of ints. Hash codes should be equal when one vector equals another. I was thinking of passing around char * instead of std::string, but then I realized that the char * should become invalid the second the string stream goes out of scope, so I don't think that is an option (unless the ss.str().c_str() would return a copy...) Thanks.

Share this post


Link to post
Share on other sites
Quote:
Original post by visage
Hash codes should be equal when one vector equals another.


This is impossible to guarantee without some kind of special knowledge about the data. Hash codes are usually fixed in length so obviously if your data can have more combinations than the hash code then the possibility of collisions must exist. I can't really see why you couldn't just put the vector into the map (although this might hit your performance because it would need to be copied once for every insertion operation). If you're just looking for a way to shrink the data some kind of run length encoding looks like it would be appropriate here.

Share this post


Link to post
Share on other sites
You need to use std::multimap if you plan on having a key being used more than once. Also, do you have a value you are storing in the map along with the key? If not maybe std::set is better for your needs.

I'm not very clear on exactly what you're doing to comment on the speed question.

Share this post


Link to post
Share on other sites
I have a class called "Configuration" which holds the vector. I cant just insert the vector, because Configuration has other data in it (parent configurations). However, configuration equality is based on the vector's data.

So I am passing in a hash based on the vector and a pointer to the configuration as the data.

Obviously I can't use a set on the pointer to the configuration (without some sort of comparator...)

Quote:

This is impossible to guarantee without some kind of special knowledge about the data.

I don't know why you say this. This is the exact specification that Java has (when obj.equals(obj2), obj.hashCode() = obj2.hashCode) -- I am just bringing it to my implementation.

I don't want to pass in the vector because this is a subclass of a configuration type. Other configuration sub classes don't use vectors, but are included in the same map. Thus, I have to figure out hashing for each type.

Though, I suppose the other configurations could easily be changed to vectors...

Thanks so far.

Share this post


Link to post
Share on other sites
Quote:
Original post by visage
Quote:

This is impossible to guarantee without some kind of special knowledge about the data.

I don't know why you say this. This is the exact specification that Java has (when obj.equals(obj2), obj.hashCode() = obj2.hashCode) -- I am just bringing it to my implementation.


Good point - sorry, not reading that carefully enough (I was thinking of the reverse implication which of course isn't true).

Share this post


Link to post
Share on other sites
So let me get this right. You're currently calculating a string from the vector and using that as the hash?

If so, that's quite a shocker for performance really. You've got memory allocation/deallocation of the strings. Resizing of strings as you build the hash. Presumably long strings to compare...

Why not calcluate the CRC of the integers in the vector, and use the result as an integer hash?

Share this post


Link to post
Share on other sites
Well, I recognize that iMalc -- hence why I asked ;)
In truth, the hash distribution is more important. It must be absolutely unique for each vector that is not equal (which is almost impossible to ask without doing the method I am using).

Ill try the CRC method. Thanks.

Share this post


Link to post
Share on other sites
CRC hash didn't work so well. I had to force my vector of ints into a vector of chars, then pass that along to the crc, which didn't do so great a job.

Maybe a fast function with a multimap?

Share this post


Link to post
Share on other sites
Because I am passing in a class that wraps the vector, not the vector itself. Except the map holds the parent class of the wrapped class. Not all the child classes use vectors (some just use ints to compare each other). Otherwise, I would use lexographical comparison.

Maybe I should just create unique comparators for each case?

Share this post


Link to post
Share on other sites
Quote:
Original post by visage
CRC hash didn't work so well. I had to force my vector of ints into a vector of chars, then pass that along to the crc, which didn't do so great a job.

Maybe a fast function with a multimap?
Really? It has always worked great for me. I hope you used at least a 16-bit CRC.
Well if that doesn't work very well, then I don't know what would.

Share this post


Link to post
Share on other sites
Dude, why forcing down into chars? Just do an int sum, and if you need more variety, multiply each item by its index in the array modulo (some power of two less than 65535). That works rather reliably for me.

Share this post


Link to post
Share on other sites
Can't do an int sum. All of the vectors sums are equivalent.

I will try the multiplication by the index location modulo some number.

Thanks.

EDIT: iMalc, I was using a crc32 algorithm...

Share this post


Link to post
Share on other sites

This topic is 4336 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this