Implementing a hash table, the right way (C++ templates)

Started by
6 comments, last by CDProp 15 years, 8 months ago
What do you think would be the best way to do this, out of the following choices? Ok, so assume your class starts out somewhat like this:

template <typename T>
class HashEntry
{
public:
	bool		m_bEmpty;	// Is this node empty?
	char		m_cKey[8];	// String key, for comparison.
	T		*m_pData;	// Pointer to actual data.
};

template <typename T, int SIZE>
class CHash
{
private:
	HashEntry<T>	*m_pData;	// = new HashEntry<T>[SIZE]
	int		m_iTableSize,	// Size of the whole table.
			m_iCount;	// Current count.
public:
	CHash();
	~CHash();
};


Ok, so I'm thinking that the constructor would handle the allocation of the HashEntries, and then assign initial values to them all (m_bEmpty = true, etc.). I'm wondering what you guys would do concerning the adding of data. I figure that the Add() method can: 1. Accept a T*, and set the HashEntry<T>::m_pData pointer directly to that address. The user of this hash table will have to make sure that the object being referenced by the pointer passed into Add() persists until they explicitly remove the reference from the table using the Remove() method, otherwise the pointer in the table will become invalid. For this reason, I don't like this option. 2. Accept a T&. It's still passed by reference, which is nice. Add() will allocate a new T object on the heap, and use the address of that object for the HashEntry<T>::m_pData pointer. The Remove() method will delete the T object. I like this, because it ensures that the pointers in the hash table are always valid, but it means that the user of the class has to create a T object from the calling code, and pass it into Add() by reference, only to have Add() copy it into a new object. Overall, I think I like #2 a lot more, but is there a better way? Anything I overlooked?
Advertisement
Quote:Original post by CDProp
1. Accept a T*, and set the HashEntry<T>::m_pData pointer directly to that address. The user of this hash table will have to make sure that the object being referenced by the pointer passed into Add() persists until they explicitly remove the reference from the table using the Remove() method, otherwise the pointer in the table will become invalid. For this reason, I don't like this option.

2. Accept a T&. It's still passed by reference, which is nice. Add() will allocate a new T object on the heap, and use the address of that object for the HashEntry<T>::m_pData pointer. The Remove() method will delete the T object. I like this, because it ensures that the pointers in the hash table are always valid, but it means that the user of the class has to create a T object from the calling code, and pass it into Add() by reference, only to have Add() copy it into a new object.

Do you wonder why the designers of the standard library chose to implement copy semantics for their containers?

Here are some other strategies to consider.

o Implement copy semantics. That way (a) you don't have to worry about ownership, and (2) you are free to imeplement CHash<C*, 100> as well as CHash<C, 100> or CHash<smart_ptr<C>, 100> as appropriate.

o Implement smart pointer semantics. You won't have to worry about pointer ownership. You will be limited to storing smart pointers in your container, but right now you're limited to storing dumb pointers and paying for all the problems that entails.

Stephen M. Webb
Professional Free Software Developer

I'm not 100% sure what you mean by implementing copy semantics. Are you referring to a copy constructor?
Copy semantics mean that when you "put something into" the hash, you actually store a copy of the thing in the hash. The object (instance of T) is responsible for being able to provide a copy (i.e. defining a copy constructor if needed). But the Add()ed instance is not the same one as the one passed in; it's merely equal to it.

BTW, use std::string to represent strings (such as the key), and consider whether a tree-map - such as already provided by the standard library - is sufficient for your purposes. It could well be that you can replace your entire source box so far, and all the planned implementation work as well, by just using a std::map<std::string, T>.
Can't you just use boost::hash_map?
STL tr1's new unordered containers are a great choice for this. Reinventing the wheel serves no practical purpose, unless you're just trying to learn. Making your own hash map class is silly for actual production code.
This is an off the cuff idea on how I would layout a hash table.
template<typename T,typename hashFunction>class hash{ std::vector< std::vector<T> > Data;};
Quote:Original post by Rydinare
STL tr1's new unordered containers are a great choice for this. Reinventing the wheel serves no practical purpose, unless you're just trying to learn. Making your own hash map class is silly for actual production code.


Just trying to figure it out on my own. :)

This topic is closed to new replies.

Advertisement