Best Dictionary data structure...

Started by
7 comments, last by Sphet 17 years ago
Hello ! I'm looking for the best dictionnary data structure ;) Actually, i'm using a very general case dictionary based upon AVL tree. This structure works very well, but i have noticed that in my system, most of the time, i used this dictionnary with a key of type UInt32 ( which is generaly the adress of the object currently stored .... ) The problem with my current dictionary, is that it is pretty ... "heavy" ... Each time i add an entry, i need to allocate a new node, edit link .... And the acces is pretty slow too ... I was just wondering if there was not a better data structure more adapted to this application: Store object in a dictionnary where the key is an UInt32. Maybe Hash map ??? But i do not get how hash map work exactly .... Do you think hash map would solve my problems ... Thanks you ;) Clem
Advertisement
Hi,
the hash map would be a better solution for your problem.

Basically you calculate an index into an array out of your UINT pointer.
Then you write the object at the calculated index.

There are plenty of articles about how to write an efficient hash map.
Why do you want to write your own?
There are ready implemented hash maps avail like the STL map for C++


I hope this helped.
Greetings
Jan
Hello thx for the advice,
i know that STL already offer us a Map class, but from what i have read it's also based upon an AVL tree ...

but anyway...

i'm gonna look for google in order to find how to deal with hash map ...
0° std::map is not a hash map. It's an associative container that relies on sorting to reduce access time. There is, however, an std::hash_map extension to the SC++L provided by SGI.

1° All the observed inefficiencies are solved by typical std::map implementations, especially by customizing the allocator if necessary. If profiling still shows that, using std::map, access is too slow, you might want to rethink the design before resorting to another container.

2° There is no best dictionary data structure. A hashtable is usually an excellent general-purpose dictionary for small data sets, but its complexity becomes linear for large sets. A hashtable tree is an option. A hashtable with a carefully crafted hash function is also a very good idea when the distribution of the values (such as addresses) is clustered.
It's me again :)

So i have search google for some informations about HashMap...

I have found a very interesting Hash fonction for string, which generate a hash key between 0 and MAX_HASH_TABLE_SIZE , i will probably use it to implement an hash table which use string as input key...

BUT !

I have not find any hash function which take as input an UInt32 ( which will be generally a memory adress ) and which return a hash key comprise between 0 and MAX_HASH_TABLE_SIZE...

This, is why i do not see how to use Hash map with a key of type UInt32 ....

If you have any idea, please let me know :)

thanks
A naive approach would be to consider your UInt32 as a four-character string, and use the string hash function. The most naive would be to divide said integer by 4 , possibly multiply it with a number that is prime with the maximum integer, and take its modulous against the maximum integer.

For more complex solutions, refer to an algorithms and data structures book, or download an existing hash library.
the solution to use the adress has a 4 byte string is great !!!
But can you explain a little bit more the seonc one ??

Does the second method avoid having collisions ???

Thanks a lot !
Actually, the second one does not attempt to avoid collisions too much. Its elimination of the least significant bits by division can cause collisions if the pointed-at objects are smaller than four bytes. Its multiplication by a number that is prime with the maximum number makes sure that allocating objects with a size that divides the maximum number will not miss a lot of spots.

Generally, allocating a lot of contiguous same-sized objects should result in an uniform repartition and no collisions, if I'm right. As for other situations, I can't predict what will happen.
As others have said there is no 'best' data structure, but depending on what you need to do more of, there are good trade offs.

I've used Trie data structure [link]the http://en.wikipedia.org/wiki/Trie[/link].

It's used for things like word processors and intellisense for matching words. It works by building a tree where each node has multiple children, each of which is differentiated from its siblings by the letter it is.

Imagine a trie with the words APE and APPLE. The trie would have one root node, 'A', with one child, 'P'. That child would have two children of its own; 'E' and 'P'. The 'E' node would have no children, while the 'P' node would have one child, an 'L' which in turn has its own child, the 'E'.

The benfit of this data structure is that you need only recurse down the trie until find that the letter you are looking for in your letter is not present at that level of the trie. From Wikipedia: "Looking up a key of length m takes worst case O(m) time."

I'm not clear on what you are looking up, but if you are doing prefix matching or searching for similar words this data structure works well.

/S

This topic is closed to new replies.

Advertisement