Jump to content
  • Advertisement
Sign in to follow this  
polyfrag

Local hash

This topic is 786 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Has anybody ever tried making a hash map that remaps the hash cipher key so that all previous and current hash keys map to subsequent slots? Eg, first random hash key maps to slot 0, then 1, etc.

 

Is 32 32-bit masks that are involved in XOR and NAND enough to map any set of 32-bit numbers to any other? Like:

 
o = (i << 1) | (i >> 31)
o = o ^ ~(i & m)
i = o
 
This would be hard to update with new keys but it could be the best localized hash. I'm writing how to do this now.
 
I was thinking it would work backward from the expect result, and the set of 32 32-bit mask values would be advanced to the closest one ahead that satisfies the hash keys. 
 
Seems kinda complicated to be practical but I'm giving it a try.

Share this post


Link to post
Share on other sites
Advertisement


Seems kinda complicated to be practical

 

this is my basic take on hash tables.

 

uniform distribution hash functions can be hard to determine in some cases.

 

perfect hash functions can be rare, requiring dealing with collisions.

 

i have yet to come across a lookup table that was too slow and required a hash table.

 

for large lookup tables, like the 83,000 caves, rockshelters, and huts in caveman, i find that index systems and a simple list work just fine.  the index system reduces a search of 83,000 records down to a search of about half a dozen records at most.

 

i took my first programming class in 1977 as a sophomore in high school.  i eventually went on to take software engineering at OSU. and now, 39 years later, i still have yet to use a hash table in a real world app.

Share this post


Link to post
Share on other sites

Is 32 32-bit masks that are involved in XOR and NAND enough to map any set of 32-bit numbers to any other? Like:


Strictly speaking, no. There are an enormous amount of possible functions that map 32 bits to 32 bits. Think of it in truth table form: let's start with a simpler problem: say we were arbitrarily mapping 32-bit values to 1-bit values: true or false.

You may have had to fill in a truth table before. There is one row for each possible input value. That means we would have 2^32 rows. The output is 1 bit. So, you need 2^32 bits, for each bit of output, to store an arbitrary one of all possible truth tables. You have 32 bits of output, so that's (32 * 2^32) bits, not (32 * 32) bits. Any redundancies in storing the function as a set of masks just reduces the number of functions you can represent further. There's no way to compress all possible functions into a smaller number of bits.

Note: I'm not saying that any particular one of those mappings takes that many bits, depending how you choose to represent them. You could create an awful lot of them if you gave yourself up to a gigabyte of a particular assembly language to work from. But still, there would have to be many possible mappings that would be left out, even with all that space.

Of course, you don't really care about the whole 2^32 input set. You only care about a small subset of keys. Depending on the actual number of items you need to store, I don't know, maybe this could work. But I also wonder whether any of this is necessary.

Edited by Pink Horror

Share this post


Link to post
Share on other sites

I tried to make it work. I can now get any single mapping of any bit width and levels. I can get 3 mappings together with up to 4 bit x 8 level masks in 7 ms. It takes too long for anything like 30 mappings for 64x60. 2 levels are needed at most for each additional mapping. n n-bit levels will just feed through the input if set to 0.

 

It's a kind of neural net too because it learns by back-propagation and learns patterns. You can also tell it to avoid a certain output given certain inputs, or to try some other combination that will satisfy the inputs and outputs.

 

I found another thing which might be useful, which is, keeping track of numbers that have been tried. For an n-bit number, at most 2*n (+1?) entries with 2 n-bit numbers each are needed. One is a mask for bits that have been tried both ways, and the other is the value that's been tried once. E.g., for a 5-bit number, there will be 11 max entries, because anything else is a 1-bit change away from any of these:

 

00000

00011

00101

01001

01010

01100

10001

10010

10100

11000

11011

 

https://dl.dropboxusercontent.com/u/109630018/temp/lm/localmap.h

https://dl.dropboxusercontent.com/u/109630018/temp/lm/localmap.cpp

Share this post


Link to post
Share on other sites

This is all but trivial to do, and it is one of those ideas which look totally ingenious at first, but which turn out being really, really bad ideas. Unless you do it for curiosity and practice, I strongly advise against it.

 

Why am I saying this?
 

first random hash key maps to slot 0, then 1, etc.

You insert two random key/value pairs and want them placed in adjacent buckets. So, the problem is finding a function that maps these two values to adjacent buckets (more precisely, a function that maps x to 0 and y to 1).

 

Now, you have those two values in your hash table, and you insert the third random value, which should land in bucket 2. You now have the problem of finding a function that maps three values to exactly three other values. See where this is going? Come back to the problem when you insert the 5,000th key/value pair.

 

So much for what it costs you to do that, but what do you gain? Well... nothing. Exactly nothing. In theory, you have a perfect distribution, which reduces collisions to the absolute minimum (if the number of buckets is greater than the number of keys, that's zero collisions). In practice, you do a lot of work in order to get a guaranteed cache miss on every access.

 

This is very much the same story as it was with Cuckoo Hashing. When it came up a decade ago, the approach was just an ingenious idea. Best idea since the invention of the wheel. You not only have the same complexity as in a normal hash table, but you have a guaranteed maximum number of accesses for the worst case. That's awesome.

Except the hash function is twice as much work, and you also have a guaranteed two cache misses for every insert and lookup, and except you have a dysproportionally high number of complete rehashes to do. Which means that regardless how awesome it sounds, in practice you're being much, much slower.

 

Now, let's say you do not insert random keys, but somewhat ordered or correlated keys, and you want to optimize for the case of looking up many keys with a common prefix or such. Well... don't use a hash table! Use a trie or crit-bit tree, or any similar, related thing.

 

And for anything "less than thousand", just use a vector anyway. Naive, linear search is not as bad as you think once cache-friendliness comes into play, and if you can afford sorting values after insert, two or at most three steps of a binary search (followed by a naive linear search from that point on) will do wonders. You do not even need to touch memory for those two steps of binary search if you cache the quartiles in the container...

Share this post


Link to post
Share on other sites

i took my first programming class in 1977 as a sophomore in high school.  i eventually went on to take software engineering at OSU. and now, 39 years later, i still have yet to use a hash table in a real world app.

Seriously? I find this bizzare, have you never used a dynamic language like python, ruby, or lua? These use hash tables idiomatically lots.

I don't find it bizarre. I've been a professional developer since the early 1980s and I have never explicitly used a hash table outside of academic exercises. I've never used Lua or Ruby, and if Python uses hashtables to implement its dictionaries, it's completely invisible and that's OK.

I've used a *lot* of std::map<> in C++. It's usually faster than a hash table for most things I need and has the strict weak ordering property, which I also usually need.

Share this post


Link to post
Share on other sites

i took my first programming class in 1977 as a sophomore in high school.  i eventually went on to take software engineering at OSU. and now, 39 years later, i still have yet to use a hash table in a real world app.

Seriously? I find this bizzare, have you never used a dynamic language like python, ruby, or lua? These use hash tables idiomatically lots.

I don't find it bizarre. I've been a professional developer since the early 1980s and I have never explicitly used a hash table outside of academic exercises. I've never used Lua or Ruby, and if Python uses hashtables to implement its dictionaries, it's completely invisible and that's OK.

I've used a *lot* of std::map<> in C++. It's usually faster than a hash table for most things I need and has the strict weak ordering property, which I also usually need.


std::unordered_map<> is a hash table. If you don't need to keep the items in your map sorted by key, it's very likely that you can use unordered_map instead of map, and in many cases the resulting code will be faster (e.g., if you have lots of entries (because hash tables have better asymptotic performance), or if key comparisons are expensive).

Share this post


Link to post
Share on other sites

std::unordered_map<> is a hash table
And it is very new in C++ terms (C++11, it seems).

So unless you have been learning programming C++ in the last 5 years, you never encountered it.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!