Archived

This topic is now archived and is closed to further replies.

Lord Hen

Hash tables...

Recommended Posts

Lord Hen    122
I got the book Data Structures for Game Programmers... its pretty good... One of the topics in it is Hash tables.. I dont see any good reason to use a hash table, although I have a few cases where I dont see another option. The reason I dont want to use one is because of data collisions, and I see absolutely no way to get around them. If two keys hash to the same thing, then according to the book, you put it in the next available spot in the table. If you do this, then there is no way to access that data, unless you use the wrong key (uhhh, wow thats smart) because if you use the original key, you will get the data originally stored there, effectively losing your new data for a different key. Can anyone recommend something to get around this? I hope that made sense, Im _really_ tired... Later, Lord Hen "I am free of all prejudices. I hate everyone equally." - W. C. Fields [edited by - Lord Hen on December 2, 2008 2:32:49 AM]

Share this post


Link to post
Share on other sites
ex4st    115
Hum, I''m not sure if the author of this book has said a such thing, but I think it''s a bit different. From what I know about hash tables it works like that :

You have 1 and only 1 key for every data but different data can have the same key (it''s called collisions). So when you have a such case, you link these data (with a linked list for example).

Imagine this : hashtable with strings

your function gives :

h("one") -> 123
h("two") -> 248
h("three") -> 123
h("four") -> 798
etc..

But we will NEVER have 2 different keys for the same string (or the h function would use random values !??)

In our example we could use a linked list to store several strings for the same hash entry (here hashTable[123]). The real advantage of a such method is a really really fast access to the data : direct access (except for collisions but it''s not terrible).

So you need a GOOD hash function to avoid as much as possible the collisions, that''s the most difficult thing.

Hope it helps

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Okay, here''s how it goes, to allow for collisions you have to start where the hash lands, and keep searching forward until you reach a blank space. Then you know it wasn''t shifted by collisions.
If your hash table has to handle removals then that''s another matter all together. You''ll need to come up with a placeholder so you still search past it in case you bumped past that spot before you removed whatever was in it. Also to make sure it isn''t an infinite loop make sure that worst case scenario you stop where you started.

If you''d prefer to avoid the collisions issue altogether instead of each value refering to an object, it can refer to a list of objects, holding all objects that have that hash.

Share this post


Link to post
Share on other sites
Lord Hen    122
Your right, the author did talk about using linked lists for collisions, but I forgot that...

Still, that doesnt work for me...I cant have two data entries for one place in my array...and its possible to get the same hash from two different strings(because of the modulus, i think)...

maybe ill just use a big linked list with a string id, but then its slower...

Thanks,
Lord Hen

"I am free of all prejudices. I hate everyone equally." - W. C. Fields

[edited by - Lord Hen on December 2, 2008 2:32:49 AM]

Share this post


Link to post
Share on other sites
BitMaster    8651
One big linked list: O(n)
Hash table: O(1) (using a few assumptions that can usually be adequately fulfilled in reality and a hash table that dynamically manages its size)

I don''t see any problem with having a linked list for each bucket. It''s the usual implementation of hash tables as far as I know.

Share this post


Link to post
Share on other sites
ex4st    115
Why is it a problem to have collisions ?

Maybe you should have a look at the STL library. There are many classes for example std::map working like a HashTable.

Share this post


Link to post
Share on other sites
Narcist    126
You seem to be mixing up hash collisions with key collisions, hash collisions are allowed, key collisions are not

say for example i have 3 keys -> 1, 2, 11
i have a hash table with 10 entries, and my hash function simply takes the key value % (modulos) 10

so hash of 1, = hash value of 1, so i store it in hashtable[1]
i now add key 2, thats a has value of 2, so it store in it hashtable[2]

so now there is one entry in hashtable[1], one entry in hashtable[2], both are uniqe and easily found

now i add keyvalue 11, 11 % 10 = a hashvalue of 1, so i go to store it in hashtable[1], there is already an entry there, so i simply point my new entry to the old entry and instead link to the new entry, so now my hashtable looks like this

hashtable[1] -> key(11) -> key(1)
hashtable[2] -> key(2)

to retrieve a key its still easy, if i want to retrieve key 1, i simply hash 1, that gives me a key value of 1, i lok at that entry in the hashtable and compare it, 11 != 1, so i move on to the next entry, 1 == 1, so i have found the entry

what you cant have is key value clashes, because then you CANT differentiate between them..

Share this post


Link to post
Share on other sites
davepermen    1047
Lord Hen. the simple solution to your problem is one simple statement:

You store the key with your value in the table!


struct HashData {
KeyType key;
DataType data;
}

HashValue h(KeyType key) { /* what ever code..*/ }

HashData[HashValue.max] hashMap;


i hope that gives the idea (its written in D, but i guess you can read it)




If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net

Share this post


Link to post
Share on other sites
Lord Hen    122
Thanks for your replies..

I guess it never occurred to me that you store the key in the linked list, but thats what I need.

Thanks,
Lord Hen

"I am free of all prejudices. I hate everyone equally." - W. C. Fields

[edited by - Lord Hen on December 2, 2008 2:32:49 AM]

Share this post


Link to post
Share on other sites
VThornheart    246
Wait... a hash table has O(1)??? Wow, that''s incredible! I''ve got to do some reading on that one... I remember learning about lists and Binary Trees in Data Structures, but no one ever mentioned hash tables.

Share this post


Link to post
Share on other sites
Extrarius    1412
Hash tables aren''t really O(1), because O is for worst-case. Hash Tables are O(n), but act like constant-time access as long as you properly implement it. If you''re storing 50000 entries, but only have 2 buckets, you''re not going to get constant time access to each element. Generally, with a good hash function, you can get constant-time access if you have about 25% more buckets than you expect to have entries. Less than that, and you''ll generally start getting a few collisions, but not too many.

Also, each bucket could be represented as something other than a linked list if you expect to have many collisions. You could have each bucket be a hash table itself (using a different hashing function of course), or a binary tree, or any other data structure. In the normal case, where you only have a few collisions, a link list will do fine and make insertion fast.

Share this post


Link to post
Share on other sites