fast hash map

Started by
33 comments, last by polygon7 16 years, 3 months ago
well ive been lookin into trie maps... and they seem good enough.. but then i thoguht that test if each character if its hight,lower or equal isnt rly necessary...

soo i did it like this... i dont think it can get any faster

	class trie_map	{	private:			public:		class Node		{		public:			Node(){}			Node(T d) : d(D){}			shared_ptr<T> D;			shared_ptr<Node> N[256]; 		};		shared_ptr<Node> Root;		inline trie_map()		{			Root = shared_ptr<T>(new T(value));		}		inline T* find(std::string& s) 		{   			shared_ptr<Node> p = Root;			int n = s.size();			while (p && n)			{				p = p->N[int(s[--n])];			}			return n ? 0 : p->D.get();		} 		inline T* insert(std::string& s, T value)		{					shared_ptr<Node> p = Root;			int n = s.size();			while (n)			{				p = p->N[int(s[--n])];				if (!p)					p.reset(new Node);							}			p->D = shared_ptr<T>(new T(value));			return p->D.get();					}	};


unfortunately it doenst work right now... i seem to get some memory error in the instertion function...

and yea it eats alot of memory but i can lower it from 256 possible characters to just the alphabet around 20 characters... which is acceptable

[Edited by - Dragon_Strike on January 7, 2008 5:35:00 PM]
Advertisement
It's easy to trade off between speed vs memory usage with a multiway trie. On one end you have the Ternary trie with least memory usage and most links to follow during searching. At the other end you have a single array where every string can be pigeon-holed into (not feasible in this case).
Somewhere in-between you have various possibilities (assuming barely more than A-Z is required) including:
4-way trie with four levels per character
8-way trie with two levels per character
32-way trie with one level per character (like what you have)
1024-way trie with two characters per level

I made a voxel space map type container not long ago which had the trie width defined as a template parameter, allowing the above mentioned tradeoff adjustments by changing one template parameter. I used template meta-programming for calculating the depths too[cool].

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:and yea it eats alot of memory but i can lower it from 256 possible characters to just the alphabet around 20 characters... which is acceptable

It's a lot of space but doing it the way you do it is bad for another purpose as well.
Say that you hash by char to something like childIndex = currentChar & 7 (8 child nodes).

What if you strings has a very uneven distribution in this hash value?
For instance childIndex will be 0 or 1 90% of the time?
Then most of your child indices will be null all the time, thats a waste of space (and performance will suffer from that waste too).
In other words building a tree like this is input sensitive which IMO is very dangerous.
You could get very good results on random data sets when trying the algorithms out, and later very poor performance on real data (or when reused later with other input data).

The search tree that I mentioned from the Dr Dobbs journal isn't that sensitive to input strings, since it chooses child index base on a variable, i.e:
childIndex = 0, if currentChar < nodeValue
childIndex = 1, if currentChar == nodeValue
childIndex = 2, if currentChar > nodeValue
Current char is choosen when the node is created.

Read the article!

Edit: Actually found the article online here (dunno if it's complete or not).
Quote:Original post by eq
Quote:and yea it eats alot of memory but i can lower it from 256 possible characters to just the alphabet around 20 characters... which is acceptable

It's a lot of space but doing it the way you do it is bad for another purpose as well.
Say that you hash by char to something like childIndex = currentChar & 7 (8 child nodes).

What if you strings has a very uneven distribution in this hash value?


They won't. The point is that you don't hash the values, you just restrict the values to a limited alphabet. That is, if you allow only characters a-z, you don't have to worry about character 3 appearing in your input set.

See also my post for an example of trie implementation which uses exactly one word per character in the alphabet in every node (instead of the above 256) and also manages memory without smart pointers (though it does not allow removal);
Hi,
if you want fast hash_map then try this (dense hash map):
http://code.google.com/p/google-sparsehash/
http://google-sparsehash.googlecode.com/svn/trunk/doc/performance.html

This topic is closed to new replies.

Advertisement