Sign in to follow this  
lefthandman

Hashing to Multiple Unique Values

Recommended Posts

Ok, so excuse me for being a nub but I'm not having much luck searching wikipedia/the web for an answer to my problem.
I've been trying to implement a generalized cuckoo hashing schema with a scalable number of hash functions (this also has applications to my pre-existing bloom filter application if I can get this working). The problem I'm having is trying to "sample without replacement" in stats terms. I would like my hashes to all be unique locations in my table that way I don't waste time mapping twice to the same space. AFAIK this should also increase the uniformity of my functions since there are a triangular number fewer possibilities to choose from when sampling without replacement.

I've got a nice hash function already, and the ability to generate enough uniformly distributed bits to hash k times, it's just that I'm having trouble with the uniqueness aspect.
Thanks!

Also, if any clarification is needed, I will most surely provide it.

Share this post


Link to post
Share on other sites
What you're looking for is a [i]perfect hash[/i].

Some domains have perfect hashes; most do not. If you can provide more detail about the domain of your hash function, we might be able to help more. Also, if you have a known finite domain and can enumerate all possible inputs in that domain, you can generate a perfect hash using a tool like gperf.

Share this post


Link to post
Share on other sites
Ok, well the domain for the current use case is currently limited to strings. I've got a murmur hash for strings that vary (in game PMs etc) and an FNV hash for those that don't (player status tuples in string form).

However I've tried to keep the hash function a black box set of bits for as long as possible so that I don't have to rewrite my implementation for everything I use it for (not saying it's better than existing hash libraries, I'm just trying to get good at this aspect of cs and jumping head first seems reasonable).

What I've read on the wikipedia article makes it seem like perfect hashing is a way to generate a hash function (such as FNV or murmur) for a predetermined set that maps the set into your hash table without collisions. That's not exactly what I'm going for here, as I simply want to hash to k unique locations in my hash table (per string) so that I can implement cuckoo hashing.

Another way of phrasing what I'm looking for is to say that given a perfectly uniform hash function (one in which any result has equal likelihood with all the others) that spits out a result n bits long map that to k results each one being n/k bits long and not allowing any repeats. Even if this ends up sacrificing uniformity, the quality of no repeats seems useful from my perspective.

Share this post


Link to post
Share on other sites
If you have a digest space of size n, it can be encoded in ceil(log2(n)) = m bits. So what you're asking for is isomorphic to a perfect hash with a digest space of m bits... where my m == your n/k. Unless I'm seriously misinterpreting something.

Share this post


Link to post
Share on other sites
Oh wait, is it that you take your (n-bit long) hash function result, do the perfect hashing precomputation on the k (n/k-bit long) chunks of the result, then immediately hash all of the k (n/k-bit long) chunks to new locations with the guarantee that they are unique new locations? Genius! That would seem to work, assuming that perfect hashing doesn't decrease the uniformity of the chunks and works for even the most random of input sets (literally randomly and uniformly distributed bits).
Thanks ApochPiQ!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this