Hash Function

Started by
12 comments, last by JohnBolton 18 years, 10 months ago
Hi everybody, I am using a Hash Table with Zobrist Keys in my KI for position Transposition Table (a hash table). The key is a 64-Bit ZobristKey. That means, it is a lot of 64 Bit numbers created with: U64 z=rand() ^ ((U64)rand() << 15) ^ ((U64)rand() << 30) ^ ((U64)rand() << 45) ^ ((U64)rand() << 60); xor-ed together (U64 being something like "unsinged long long"). So a key is something like U64 ZKey=0; for(i=0;i<n;i++) ZKey=ZKey xor z; With z being created as "z" above. (of course, they are created differently, but they could be created that way). Now, my hash-function looks as easy as it could be: GetPosition(U64 ZKey) { return ZKey % TABLE_SIZE; } (TABLE_SIZE is the size of the hash table). My Question: I have the feeling (which is based on some random information reaching me), that this is not the best hash function I could have choosen. Do you have any advice on how I could find a better hash function? Do you know a hash function which you would say is better? Thanks! Nathan
Advertisement
Standard way to generate a hash key from complex data-types is to xor it's simple members. Another thing is that you have to distinguish between:
H( {a,b} )
and
H( {b,a} )
so they would not be the same (if you don't require them to be, that is).

People often use shifting in addition to xor (just as in your first example).
This can however create patterns, as members may tend to change synchronously, that nullifying changes in the final key. As in:
H(a,b) = a ^ (b << 2);
a=__100010 ^
b=010001
If the change is:
a=__100110 ^
b=010000
_______*
You get same key.
That can be solved by multiplying the source element before xoring by some arbitrary number(best are primes). Each member by different number.

Simplified(but still powerful) solution to this is to calculate a polynomial of some arbitrary(again, prime) number, with coefficients from your input data. That way you can easily expand this to data of any size (like arrays, or strings).
So:
x = // some prime number.
U64 ZKey=0;
for(i=0;i<n;i++)
ZKey = ZKey * x + z;

Cheers.
/def
Thanks for your answer deffer!
But generating the key on a different way is not an option (I think). Each key represents a unique board situation (it is all about a AI(=KI in German) for a board game). The key changes whenever the board changes and is recalculated on the fly regarding the changes on the board.
Recalculating the key whenerver I want to look up in the Hash Table is not an option because it would take way to long!

So (I think and correct me if I am wrong) the only thing I could change is the Hash Function which gets ZKey(=z[0] xor z[2] xor ... z[n-1]) as input and can not work with z[0]-z[n-1]. The other thing that I could change is the way those z are generated.
Do you have any other advice?
The Hash function is called very often, so it needs to be fast!
Thanks!
Nathan
Quote:Original post by LonelyStar
Do you know a hash function which you would say is better?
[...]
But generating the key on a different way is not an option (I think).

I'm a bit confused now...
But I'll try to decipher it anyway. You would want to recalculate your old key to a new one, when only a small part of your input data changes. With as little cost as possible. That's why xor-ing seems so handy here.

I don't think anything would be faster. The only thing you can upgrade, is randomness of the output key.
You can still pre-multiply your data before xor-ing. I can't think of anything else...
I see from your confusion that I might have missused terms :).
The ZKey is my "Object" which I want to hash. My Hash Function is supposed to give me a position in the Table depending on the ZKey I give to it.
So, if I premultiply befor xoring, could I not just multiply when generating the keys?
That means my key generation part (done once at programm startup) would look like:

int PrimeNumbers[NUMBER_OF_ZS];

z=rand() ^ ((U64)rand() << 15) ^ ((U64)rand() << 30) ^ ((U64)rand() << 45) ^ ((U64)rand() << 60);
z=z*PrimeNumbers;

Do I understand correctly?
And again: Thanks!
Nathan
Quote:Original post by LonelyStar
U64 z=rand() ^ ((U64)rand() << 15) ^ ((U64)rand() << 30) ^ ((U64)rand() << 45) ^ ((U64)rand() << 60);
[...]
(of course, they are created differently, but they could be created that way).


So I'm assuming your z really means something, isn't just a random number. Eg. coded position of one figure on the chess board. If that's the case, then go ahead, but remember, that final ZKey woul dnot be carrying all the information needed to distinguish it from ZKey from another z[]. Chances are, they would still be the same.

But if you're generating z[] just randomly, then I'm missing the whole point of all the operations you perform, they seem simply worthless. So you'll have to explain the purpose behind your calculation needs.

So which is it?
/def
OK, the purpose is as follows:
I will explain it with 4-in-a-row (so you do not need to learn Zertz, the game I am doing it for).
For every "place" in the "4-in-a-row" there are 3 posibilities of States:
Empty, a piece of Player A and a piece of Player B. For every "place", for every state posibility of that place a "z" is generated at startup with the described function. We would have an array like:
U64 z[BOARD_X][BOARD_Y][2];
Then, for a given "board-state" the ZKey would be generated as follows:
U64 ZKey=0;for(i=0;i<BOARD_X;i++){ for(j=0;j<BOARD_Y;j++) {  switch(TheBoard[j])  {   case EMPTY:   break;  //No key needed for empty fields   case A: //Player A has a piece here   ZKey=ZKey xor z[j][0];   break;   case B: //Player B has a piece here   ZKey=ZKey xor z[j][1];   break;  } }}

But the ZKey is recalculated whenerver the state changes. In example if Player P (0 for A, 1 for B) places a piece at position a,b:
ZKey=ZKey xor z[a];


The purpose: Every 64-Bit Zkey represents a unique (unique enough) board-state. I can not store a whole board in the hash table, but I can store the ZKey. And all my "z" are completly random.
I hope you understood everything.
Nathan
Heh, now I got it.

So the prime numbers I proposed had to be doing the same job your z[] do.
So the primes are not needed at all.

What you have is:
const U64 z[N];
bool mstate[N];
mstate indicates micro-state of some atomic piece of your "board".

You're looking for a function
U64 F(const U64[N], bool[N])
with following properties:
- there exists (quite simple) function U64 G(U64 prevKey, int nPos, bool bNewVal) that if we had:
bool b0[N];
bool b1[N];
that b0 and b1 differ on one position(nPos), then
F(z,b1) == G( F(z,b0), nPos, b1[nPos] )

Am I right?
/def
My favourite hash function is to generate a CRC over the data and then 'and' it with the table size minus 1. (The table size is a power of two of course).
CRC was practically born for this. Never mind those messy prime number table sizes.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Now I do not completly understand your post.
You mean G is my:
ZKey=ZKey xor z[a]

;
???

What I am looking for:
I have a function which takes a ZKey and returns a position in the Hash Table. Currently that is:
Zkey % TABLE_SIZE;
But, because the number of possible ZKeys is much bigger than Table Size, collisions do occure! (different ZKeys are mapped to the same position in the table).
I am looking for a way of minimizing the collisions by:
a) Changing the "Zkey % TABLE_SIZE;" to something better.
or
b) Generating my "z" better.

I think, that "a" has more potential.

As said, I am nur sure I understood your post. But do you now know what I want?
Nathan

This topic is closed to new replies.

Advertisement