Jump to content
  • Advertisement
Sign in to follow this  
Dragon_Strike

fast hash map

This topic is 3791 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

im experimenting with my own implementation of a hash map... im not worried about memory size and the key will always be a string... insertion speed is also not imoprtant... whats important for me is the "find" performance so i thought that instead of creating one big list and do linear or binary search... i create several smaller list and generate an index from the key for the list... this way i can split for example an 200 item list into 10 list with 20 items each... then all i have to do is to calculate the list index and to a linear search for 20 items instead of 200... something like this
#define CMIN 65
#define CMAX 122
#define CRANGE (CMAX-CMIN)

namespace DX
{
	template <typename Data>
	class fast_hash_map
	{
	private:
		typedef std::pair<std::string, Data> Hash;
		std::vector<Hash> m_Hash_Map[CRANGE+2]; // create a list for each character, set first and last as wildcard holders

	public:

		Data insert(std::string name, Data value)
		{
			m_Hash_Map[GetIndex(name)].push_back(Hash(name, value));		

			return value;
		}

		Data find(std::string key)
		{
			std::vector<Hash>* CurHashMap = &m_Hash_Map[GetIndex(key)];

			for (UINT n = 0; n < CurHashMap->size(); n++)
			{
				if (CurHashMap->at(n).first == key)
					return CurHashMap->at(n).second;
			}
			return 0;
		}

		inline int GetIndex(std::string name)
		{			
			int rangeindex = int((int(name.at(name.size()-1)>>1) + 
								  int(name.at(name.size()-1)>>2))>>1; // Take two characters when creating index to spread

			return max(min(rangeindex-CMIN+1, CRANGE),0); // make sure index is within range, if not place in first or last
		}
	};
}

however when i compare the performance against the standard extstd::hash_map... i get results where my implementaion is slower... why is that?

Share this post


Link to post
Share on other sites
Advertisement
map's find performance is O(lgN), your class performance is O(N) divided by 10. For small values of N, your class is faster, but as N increases, your performance will be slower and slower. Read about big O notation and complexity analysis.

Share this post


Link to post
Share on other sites
First of all, std::hash_map is usually implemented as a balanced tree (e.g. a red-black-tree) which has a worst-case complexity of something around O(log(n)) for the find operation. Your implementation however has a worst-case complexity of O(n).

Your code is basically an implementation of a chaining hash table. In the best case, these hash-tables can outperform balanced trees by quite a bit. However, this all depends on how well your hashing function works (i.e. how many collisions it will produce for a given set of keys).

Also, are you testing the performance in debug or release mode? Where do you lose performance? Is it because the algorithm itself is slow or is it because a lot of copying is going on.

Also note that the at() method is bounds-checked and will therefore be slower than the subscript operator. Since the way you access the vector is safe to begin with, the extra bounds-checking is unnecessary.

Share this post


Link to post
Share on other sites
Quote:
Original post by Harry Hunt
First of all, std::hash_map is usually implemented as a balanced tree (e.g. a red-black-tree) which has a worst-case complexity of something around O(log(n)) for the find operation. Your implementation however has a worst-case complexity of O(n).

Say what? The hash part of the type name indicates it's a hash table; that's the entire point. If you want a balanced tree, you can go with std::map.

Share this post


Link to post
Share on other sites
Assuming one map has 200 elements.

Log(n) will take 7.6 tries on average to find the element.
Splitting the item into buckets of 20 will take 10 tries on average + selection of bucket, but that's O(1).
Hash_map should perform considerably better than either of those.

Also, hash_map was written and designed by professional software engineers. If that performance is inadequate, the only problem is your hash function, not the structure.


Data find(std::string key)
You make a redundant copy of string each time you call find.
You also return a copy of Data on each call.

Data insert(std::string name, Data value)
First you make a copy of both string and Data, then GetIndex makes a copy of Data, then Data is copied again on insertion, then Data is copied again when returned from function.

Inlining may remove some of those copies, but still, that's redundancy that's completely unacceptable for such structures. Pass everything by reference, strings by const reference.

Share this post


Link to post
Share on other sites
Your hash seems bad too.

Testing on a bunch of strings (only lower and uppercase a-z), it seems to continously return negatives for rangeindex-CMIN+1, which makes it always output 0, making your search a linear one with no "sorting" done.

Share this post


Link to post
Share on other sites
why would it give me 0?

the char -> int conversion starts at 65 = A

soo i subtract 65 from every conversion...

how do u calculate how many tries it takes? i mean log(200) != 7.6

EDIT::

thx for all the suggestions!

its just a bit faster than extstd::hash_map in my application with 30 items

13.6ms vs 13.2ms

Share this post


Link to post
Share on other sites
Quote:
Original post by Dragon_Strike
why would it give me 0?

the char -> int conversion starts at 65 = A

soo i subtract 65 from every conversion...


rangeIndex has a *far* more complex operation than subtracting 65 from the source.

A sample hash is:

int GetIndex( const std::string &str )
{
return str.front() % CRANGE;
}



Given your class (fixed so it would compile for me), with a print statement in the insert() function:

int main()
{
DX::fast_hash_map<std::string> map;
for( char i = 'a' ; i <= 'z' ; ++i )
{
std::string str;
str += i;

map.insert(str,str);

str[0] = toupper(i);
map.insert(str,str);
}
}



It consistently outputs:
Quote:

inserting into bucket: 0

Share this post


Link to post
Share on other sites
ok...

this is how it looks now then... performance varies depending on "Range" and "Spread"



// By Dragon_Strike

template <typename Data, int Range, int Spread>
class fast_hash_map
{
private:
typedef std::pair<std::string, Data> Hash;
std::vector<Hash> m_Hash_Map[Range]; // create a list for each character, set first and last as wildcard holders

public:

inline Data insert(const std::string& name, const Data value)
{
m_Hash_Map[GetIndex(name)].push_back(Hash(name, value));

return value;
}

inline Data* find(const std::string& key)
{
int index = GetIndex(key);
for (UINT n = 0; n < m_Hash_Map[index].size(); n++)
{
if (m_Hash_Map[index][n].first == key)
return &m_Hash_Map[index][n].second;
}
return 0;
}

inline int GetIndex(const std::string& name)
{
int index = 0;
for (int n = 0; n < Spread; n++)
{
index += int(name[(name.size()-1)>>n]);
}

return index % Range;
}
};











EDIT::

now i get 12ms vs 13.3ms... mine is faster... but my benchmarks aren't very accurate... anyone willing to do a better test?

what more can i do to make it faster?

EDIT2::

the way i see it... the larger i set the range and find a good spread the faster it gets... in best case every item gets its own index... which gives one index calculation and one search... that should be faster than O(log(n))??

[Edited by - Dragon_Strike on January 6, 2008 10:28:36 AM]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!