Back to General and Gameplay Programming

fast hash map

General and Gameplay Programming Programming

Started by Dragon_Strike January 05, 2008 04:31 PM

33 comments, last by polygon7 16 years, 3 months ago

Dragon_Strike

264

Author

January 05, 2008 04:31 PM

im experimenting with my own implementation of a hash map... im not worried about memory size and the key will always be a string... insertion speed is also not imoprtant... whats important for me is the "find" performance so i thought that instead of creating one big list and do linear or binary search... i create several smaller list and generate an index from the key for the list... this way i can split for example an 200 item list into 10 list with 20 items each... then all i have to do is to calculate the list index and to a linear search for 20 items instead of 200... something like this


#define CMIN 65
#define CMAX 122
#define CRANGE (CMAX-CMIN)

namespace DX
{
	template <typename Data>
	class fast_hash_map
	{
	private:
		typedef std::pair<std::string, Data> Hash;
		std::vector<Hash> m_Hash_Map[CRANGE+2]; // create a list for each character, set first and last as wildcard holders

	public:

		Data insert(std::string name, Data value)
		{
			m_Hash_Map[GetIndex(name)].push_back(Hash(name, value));		

			return value;
		}

		Data find(std::string key)
		{
			std::vector<Hash>* CurHashMap = &m_Hash_Map[GetIndex(key)];

			for (UINT n = 0; n < CurHashMap->size(); n++)
			{
				if (CurHashMap->at(n).first == key)
					return CurHashMap->at(n).second;
			}
			return 0;
		}

		inline int GetIndex(std::string name)
		{			
			int rangeindex = int((int(name.at(name.size()-1)>>1) + 
								  int(name.at(name.size()-1)>>2))>>1; // Take two characters when creating index to spread

			return max(min(rangeindex-CMIN+1, CRANGE),0); // make sure index is within range, if not place in first or last
		}
	};
}

however when i compare the performance against the standard extstd::hash_map... i get results where my implementaion is slower... why is that?

Dizzy_exe

160

January 05, 2008 04:41 PM

map's find performance is O(lgN), your class performance is O(N) divided by 10. For small values of N, your class is faster, but as N increases, your performance will be slower and slower. Read about big O notation and complexity analysis.

2+2=5, for big values of 2!

Harry Hunt

543

January 05, 2008 04:50 PM

First of all, std::hash_map is usually implemented as a balanced tree (e.g. a red-black-tree) which has a worst-case complexity of something around O(log(n)) for the find operation. Your implementation however has a worst-case complexity of O(n).

Your code is basically an implementation of a chaining hash table. In the best case, these hash-tables can outperform balanced trees by quite a bit. However, this all depends on how well your hashing function works (i.e. how many collisions it will produce for a given set of keys).

Also, are you testing the performance in debug or release mode? Where do you lose performance? Is it because the algorithm itself is slow or is it because a lot of copying is going on.

Also note that the at() method is bounds-checked and will therefore be slower than the subscript operator. Since the way you access the vector is safe to begin with, the extra bounds-checking is unnecessary.

Dragon_Strike

264

Author

January 05, 2008 04:54 PM

its in release mode... ill try using subscripts

SiCrane

11,840

January 05, 2008 04:58 PM

Quote:Original post by Harry Hunt
First of all, std::hash_map is usually implemented as a balanced tree (e.g. a red-black-tree) which has a worst-case complexity of something around O(log(n)) for the find operation. Your implementation however has a worst-case complexity of O(n).

Say what? The hash part of the type name indicates it's a hash table; that's the entire point. If you want a balanced tree, you can go with std::map.

Antheus

2,410

January 05, 2008 05:01 PM

Assuming one map has 200 elements.

Log(n) will take 7.6 tries on average to find the element.
Splitting the item into buckets of 20 will take 10 tries on average + selection of bucket, but that's O(1).
Hash_map should perform considerably better than either of those.

Also, hash_map was written and designed by professional software engineers. If that performance is inadequate, the only problem is your hash function, not the structure.

Data find(std::string key)

You make a redundant copy of string each time you call find.
You also return a copy of Data on each call.

Data insert(std::string name, Data value)

First you make a copy of both string and Data, then GetIndex makes a copy of Data, then Data is copied again on insertion, then Data is copied again when returned from function.

Inlining may remove some of those copies, but still, that's redundancy that's completely unacceptable for such structures. Pass everything by reference, strings by const reference.

rip-off

11,000

January 05, 2008 05:06 PM

Your hash seems bad too.

Testing on a bunch of strings (only lower and uppercase a-z), it seems to continously return negatives for rangeindex-CMIN+1, which makes it always output 0, making your search a linear one with no "sorting" done.

blog | twitter

Dragon_Strike

264

Author

January 05, 2008 05:09 PM

why would it give me 0?

the char -> int conversion starts at 65 = A

soo i subtract 65 from every conversion...

how do u calculate how many tries it takes? i mean log(200) != 7.6

EDIT::

thx for all the suggestions!

its just a bit faster than extstd::hash_map in my application with 30 items

13.6ms vs 13.2ms

rip-off

11,000

January 05, 2008 05:20 PM

Quote:Original post by Dragon_Strike
why would it give me 0?

the char -> int conversion starts at 65 = A

soo i subtract 65 from every conversion...

rangeIndex has a *far* more complex operation than subtracting 65 from the source.

A sample hash is:

int GetIndex( const std::string &str ){   return str.front() % CRANGE;}

Given your class (fixed so it would compile for me), with a print statement in the insert() function:

int main(){    DX::fast_hash_map<std::string> map;    for( char i = 'a' ; i <= 'z' ; ++i )    {        std::string str;        str += i;        map.insert(str,str);        str[0] = toupper(i);        map.insert(str,str);    }}

It consistently outputs:

Quote:
inserting into bucket: 0

blog | twitter

Dragon_Strike

264

Author

January 05, 2008 05:28 PM

ok...

this is how it looks now then... performance varies depending on "Range" and "Spread"

        // By Dragon_Strike	template <typename Data, int Range, int Spread>	class fast_hash_map	{	private:		typedef std::pair<std::string, Data> Hash;		std::vector<Hash> m_Hash_Map[Range]; // create a list for each character, set first and last as wildcard holders	public:		inline Data insert(const std::string& name, const Data value)		{			m_Hash_Map[GetIndex(name)].push_back(Hash(name, value));					return value;		}		inline Data* find(const std::string& key)		{			int index = GetIndex(key);			for (UINT n = 0; n < m_Hash_Map[index].size(); n++)			{				if (m_Hash_Map[index][n].first == key)					return &m_Hash_Map[index][n].second;			}			return 0;		}		inline int GetIndex(const std::string& name)		{						 int index = 0;			 for (int n = 0; n < Spread; n++)			 {				index += int(name[(name.size()-1)>>n]);			 }			 return index % Range;		}	};

EDIT::

now i get 12ms vs 13.3ms... mine is faster... but my benchmarks aren't very accurate... anyone willing to do a better test?

what more can i do to make it faster?

EDIT2::

the way i see it... the larger i set the range and find a good spread the faster it gets... in best case every item gets its own index... which gives one index calculation and one search... that should be faster than O(log(n))??

[Edited by - Dragon_Strike on January 6, 2008 10:28:36 AM]

fast hash map

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

fast hash map

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines