Jump to content

  • Log In with Google      Sign In   
  • Create Account


Hashing to Multiple Unique Values


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 lefthandman   Members   -  Reputation: 138

Like
0Likes
Like

Posted 20 April 2012 - 11:44 PM

Ok, so excuse me for being a nub but I'm not having much luck searching wikipedia/the web for an answer to my problem.
I've been trying to implement a generalized cuckoo hashing schema with a scalable number of hash functions (this also has applications to my pre-existing bloom filter application if I can get this working). The problem I'm having is trying to "sample without replacement" in stats terms. I would like my hashes to all be unique locations in my table that way I don't waste time mapping twice to the same space. AFAIK this should also increase the uniformity of my functions since there are a triangular number fewer possibilities to choose from when sampling without replacement.

I've got a nice hash function already, and the ability to generate enough uniformly distributed bits to hash k times, it's just that I'm having trouble with the uniqueness aspect.
Thanks!

Also, if any clarification is needed, I will most surely provide it.

Sponsor:

#2 ApochPiQ   Moderators   -  Reputation: 14103

Like
1Likes
Like

Posted 21 April 2012 - 12:07 AM

What you're looking for is a perfect hash.

Some domains have perfect hashes; most do not. If you can provide more detail about the domain of your hash function, we might be able to help more. Also, if you have a known finite domain and can enumerate all possible inputs in that domain, you can generate a perfect hash using a tool like gperf.

#3 lefthandman   Members   -  Reputation: 138

Like
0Likes
Like

Posted 21 April 2012 - 12:33 AM

Ok, well the domain for the current use case is currently limited to strings. I've got a murmur hash for strings that vary (in game PMs etc) and an FNV hash for those that don't (player status tuples in string form).

However I've tried to keep the hash function a black box set of bits for as long as possible so that I don't have to rewrite my implementation for everything I use it for (not saying it's better than existing hash libraries, I'm just trying to get good at this aspect of cs and jumping head first seems reasonable).

What I've read on the wikipedia article makes it seem like perfect hashing is a way to generate a hash function (such as FNV or murmur) for a predetermined set that maps the set into your hash table without collisions. That's not exactly what I'm going for here, as I simply want to hash to k unique locations in my hash table (per string) so that I can implement cuckoo hashing.

Another way of phrasing what I'm looking for is to say that given a perfectly uniform hash function (one in which any result has equal likelihood with all the others) that spits out a result n bits long map that to k results each one being n/k bits long and not allowing any repeats. Even if this ends up sacrificing uniformity, the quality of no repeats seems useful from my perspective.
ROFLMAO-GG-HF-GL-LOL-TTYL-BRB-GTG

#4 ApochPiQ   Moderators   -  Reputation: 14103

Like
0Likes
Like

Posted 21 April 2012 - 12:48 AM

If you have a digest space of size n, it can be encoded in ceil(log2(n)) = m bits. So what you're asking for is isomorphic to a perfect hash with a digest space of m bits... where my m == your n/k. Unless I'm seriously misinterpreting something.

#5 lefthandman   Members   -  Reputation: 138

Like
0Likes
Like

Posted 21 April 2012 - 01:16 AM

Oh wait, is it that you take your (n-bit long) hash function result, do the perfect hashing precomputation on the k (n/k-bit long) chunks of the result, then immediately hash all of the k (n/k-bit long) chunks to new locations with the guarantee that they are unique new locations? Genius! That would seem to work, assuming that perfect hashing doesn't decrease the uniformity of the chunks and works for even the most random of input sets (literally randomly and uniformly distributed bits).
Thanks ApochPiQ!
ROFLMAO-GG-HF-GL-LOL-TTYL-BRB-GTG




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS