Back to General and Gameplay Programming

fast hash map

Dragon_Strike · 2008-01-08T14:41:08

im experimenting with my own implementation of a hash map... im not worried about memory size and the key will always be a string... insertion speed is also not imoprtant... whats important for me is the "find" performance so i thought that instead of creating one big list and do linear or binary search... i create several smaller list and generate an index from the key for the list... this way i can split for example an 200 item list into 10 list with 20 items each... then all i have to do is to calculate the list index and to a linear search for 20 items instead of 200... something like this #define CMIN 65 #define CMAX 122 #define CRANGE (CMAX-CMIN) namespace DX { template <typename Data> class fast_hash_map { private: typedef std::pair<std::string, Data> Hash; std::vector<Hash> m_Hash_Map[CRANGE+2]; // create a list for each character, set first and last as wildcard holders public: Data insert(std::string name, Data value) { m_Hash_Map[GetIndex(name)].push_back(Hash(name, value)); return value; } Data find(std::string key) { std::vector<Hash>* CurHashMap = &m_Hash_Map[GetIndex(key)]; for (UINT n = 0; n < CurHashMap->size(); n++) { if (CurHashMap->at(n).first == key) return CurHashMap->at(n).second; } return 0; } inline int GetIndex(std::string name) { int rangeindex = int((int(name.at(name.size()-1)>>1) + int(name.at(name.size()-1)>>2))>>1; // Take two characters when creating index to spread return max(min(rangeindex-CMIN+1, CRANGE),0); // make sure index is within range, if not place in first or last } }; } however when i compare the performance against the standard extstd::hash_map... i get results where my implementaion is slower... why is that?

General and Gameplay Programming Programming

Started by Dragon_Strike January 05, 2008 04:31 PM

33 comments, last by polygon7 16 years, 3 months ago

Dragon_Strike

264

Author

January 07, 2008 04:35 PM

well ive been lookin into trie maps... and they seem good enough.. but then i thoguht that test if each character if its hight,lower or equal isnt rly necessary...

soo i did it like this... i dont think it can get any faster

	class trie_map	{	private:			public:		class Node		{		public:			Node(){}			Node(T d) : d(D){}			shared_ptr<T> D;			shared_ptr<Node> N[256]; 		};		shared_ptr<Node> Root;		inline trie_map()		{			Root = shared_ptr<T>(new T(value));		}		inline T* find(std::string& s) 		{   			shared_ptr<Node> p = Root;			int n = s.size();			while (p && n)			{				p = p->N[int(s[--n])];			}			return n ? 0 : p->D.get();		} 		inline T* insert(std::string& s, T value)		{					shared_ptr<Node> p = Root;			int n = s.size();			while (n)			{				p = p->N[int(s[--n])];				if (!p)					p.reset(new Node);							}			p->D = shared_ptr<T>(new T(value));			return p->D.get();					}	};

unfortunately it doenst work right now... i seem to get some memory error in the instertion function...

and yea it eats alot of memory but i can lower it from 256 possible characters to just the alphabet around 20 characters... which is acceptable

[Edited by - Dragon_Strike on January 7, 2008 5:35:00 PM]

iMalc

2,466

January 07, 2008 11:15 PM

It's easy to trade off between speed vs memory usage with a multiway trie. On one end you have the Ternary trie with least memory usage and most links to follow during searching. At the other end you have a single array where every string can be pigeon-holed into (not feasible in this case).
Somewhere in-between you have various possibilities (assuming barely more than A-Z is required) including:
4-way trie with four levels per character
8-way trie with two levels per character
32-way trie with one level per character (like what you have)
1024-way trie with two characters per level

I made a voxel space map type container not long ago which had the trie width defined as a template parameter, allowing the above mentioned tradeoff adjustments by changing one template parameter. I used template meta-programming for calculating the depths too[cool].

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

656

January 08, 2008 03:21 AM

Quote:and yea it eats alot of memory but i can lower it from 256 possible characters to just the alphabet around 20 characters... which is acceptable

It's a lot of space but doing it the way you do it is bad for another purpose as well.
Say that you hash by char to something like childIndex = currentChar & 7 (8 child nodes).

What if you strings has a very uneven distribution in this hash value?
For instance childIndex will be 0 or 1 90% of the time?
Then most of your child indices will be null all the time, thats a waste of space (and performance will suffer from that waste too).
In other words building a tree like this is input sensitive which IMO is very dangerous.
You could get very good results on random data sets when trying the algorithms out, and later very poor performance on real data (or when reused later with other input data).

The search tree that I mentioned from the Dr Dobbs journal isn't that sensitive to input strings, since it chooses child index base on a variable, i.e:
childIndex = 0, if currentChar < nodeValue
childIndex = 1, if currentChar == nodeValue
childIndex = 2, if currentChar > nodeValue
Current char is choosen when the node is created.

Read the article!

Edit: Actually found the article online here (dunno if it's complete or not).

ToohrVyk

1,596

January 08, 2008 04:05 AM

Quote:Original post by eq
Quote:and yea it eats alot of memory but i can lower it from 256 possible characters to just the alphabet around 20 characters... which is acceptable

It's a lot of space but doing it the way you do it is bad for another purpose as well.
Say that you hash by char to something like childIndex = currentChar & 7 (8 child nodes).

What if you strings has a very uneven distribution in this hash value?

They won't. The point is that you don't hash the values, you just restrict the values to a limited alphabet. That is, if you allow only characters a-z, you don't have to worry about character 3 appearing in your input set.

See also my post for an example of trie implementation which uses exactly one word per character in the alphabet in every node (instead of the above 256) and also manages memory without smart pointers (though it does not allow removal);

Blog — Facebook

polygon7

122

January 08, 2008 02:41 PM

Hi,
if you want fast hash_map then try this (dense hash map):
http://code.google.com/p/google-sparsehash/
http://google-sparsehash.googlecode.com/svn/trunk/doc/performance.html

fast hash map

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

fast hash map

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines