Followers 0

[C++] Problems designing a proper hash table w/ chaining

25 posts in this topic

Hey Guys,

I'm trying to design a my own hash table data structure using chaining, but my main issue is designing flexible components that can serve my hash table class. Before anyone asks why I'm trying to make my own hash table, this is part of an assignment for one of my classes, and I figured that a custom hash table would help in future game projects (I heard hash tables were commonly used for managing game assets).

My hash table contains a sorted linked list class that I also created for a previous assignment, but I'm trying to figure out how I would deal with a look up in the hash table. My first idea was to implement a find function in my list class, but I can't figure out how to design it so that it is flexible enough for my hash table to use (i.e. if my hash table decided to do a look up with a string rather instead of a number, and vice versa). Please feel free to critique my class design. This is all a learning process so I would love to hear from more experienced programmers.

Here is my list class. My get function should be returning a ptr to the data object, right?:
[CODE]
/************************************************************************
* SortedList class template.
* Will manage and sort inherent C++ types.
* *NOTE* Overloaded operator<() must be available in custom
* class objects in order for sorting to work properly.
************************************************************************/
template <typename T, typename Node = ListNode<T>>
class SortedList
{
public:
/** Class constructor.
*/
SortedList();

/** Class destructor.
* Deallocates heap memory generated by the class in addition to
* destroying the class instance.
*/
~SortedList();

/** List manipulator.
* Inserts an item to the list in sorted order.
* @param item An item of the same type as the list.
*/
void Insert(const T &newItem);

/** Iterator
* Returns a ptr to the beginning of the list.
*/
Node* Front();

/** List manipulator.
* Empties the entire list i.e. pops off all items.
*/
void Clear();

/** Utility function.
* Returns true if the list is empty, else returns false.
*/
bool IsEmpty() const;

/** Utility function.
* Returns the number of elements in the list.
*/
int Size() const;

private:
unsigned int m_count; ///< Keeps track of how many items in the list.
Node<T> *m_pFront; ///< Points to the front of the list.

/** List manipulator.
* Takes a an item off the front of the list, and returns it.
* @throw (char*) Throws a char* exception if called against an empty list.
*/
T Pop();
};

/************************************************************************
* Method Definitions
************************************************************************/
template <typename T, typename Node>
SortedList<T, Node>::SortedList() : m_count(0), m_pFront(nullptr) {}

template <typename T, typename ListNode>
SortedList<T, Node>::~SortedList()
{
Clear();
}

template <typename T, typename Node>
void SortedList<T, Node>::Insert(const T &newItem)
{
// Make new node the front if it has the 'lowest' value, or list is empty
if (m_pFront == nullptr || newItem < *(m_pFront->Data()))
m_pFront = new Node(newItem, m_pFront);
else
{
Node *pLinkingNode = m_pFront; // newNode will be pointed to by this.
// Keep iterating until we find the 'in-order' pos, until we hit the end.
{
}
// Insert the node in its appropriate position
Node *pNewNode = new Node(newItem, pLinkingNode->Next());
}
m_count++;
}

template <typename T, typename Node>
inline Node* SortedList<T, Node>::Front() const
{
return m_pFront;
}

template <typename T, typename Node>
void SortedList<T, Node>::Clear()
{
while (m_count != 0)
Pop();
}

template <typename T, typename Node>
inline bool SortedList<T, Node>::IsEmpty() const
{
if (m_count == 0)
return true;
else
return false;
}

template <typename T, typename Node>
inline int SortedList<T, Node>::Size() const
{
return m_count;
}

template <typename T, typename Node>
void SortedList<T, Node>::Print() const
{
if (m_count == 0)
{
std::cout << "There are no items in the list.\n";
}
else
{
Node* pCurrentNode = m_pFront;
for (unsigned int i = 0; i < m_count; i++)
{
std::cout << *(pCurrentNode->Data()) << std::endl;
pCurrentNode = pCurrentNode->Next();
}
}
}

template <typename T, typename Node>
T SortedList<T, Node>::Pop()
{
// Proceed with pop unless the list is empty
if (m_count > 0)
{
T tempItem = *(m_pFront->Data()); // Hold item to be returned
// Preserve pointer to node after the front and delete the current front
Node *pTempNode = m_pFront;
m_pFront = m_pFront->Next();
delete pTempNode;
m_count--;
return tempItem;
}
else
{
throw std::runtime_error("Error: Cannot pop an item from an empty list!");
}
}
[/CODE]

Here is my default node structure for my list (Is it overkill to encapsulate the data members of the node?):
[CODE]
template <typename T>
class ListNode
{
public:
/** Class constructor.
* @param item Object to store inside the node.
* @param pNext Ptr to the following node.
*/
ListNode(const T &item, ListNode *pNext = nullptr)
: m_data(&item), m_pNext(pNext) {}

/** Class destructor
* Deallocates heap memory generated by the class in addition to
* destroying the class instance.
*/
~ListNode() {}

/** Getter function.
* Returns a ptr to the object stored in the node.
*/
T* Data() const { return &m_data; }

/** Getter function.
* Returns a ptr to the next node.
*/
ListNode* Next() const { return m_pNext; }

/** Setter function.
* Changes the next node pointed to.
* @param pNext Ptr to new node.
*/
void SetNext(ListNode *pNext) { m_pNext = pNext; }

protected:
T m_data; ///< Stores node's value.
ListNode *m_pNext; ///< Points to next node, if any.
};
[/CODE]

Lastly, here is the current design of my hash table class. I haven't actually implemented anything here yet, since I'm still redesigning the list class:
[CODE]
template <typename Data, typename Key>
class ChainedHashTable
{
public:
ChainedHashTable( int size = 11 );
~ChainedHashTable();

/** Hashing func.
* Returns the hash value according to a hash key.
* @param key The seed value used to hash.
*/
int Hash( Key key ) const;

/** Insertion func.
* Use this to add items to the hash table.
* @param item Const reference to object being added to the table.
*/
void Insert( const Data &data, Key key );

/** Search func.
* Returns the index value of the object in question.
*/
Data* LookUp( Key key ) const;

private:
std::vector<SortedList> *m_pTable;

/** Collision resolution func.
* Inserts data into the hashed chain, in order.
* @param index Value of the index where the collision occurred.
*/
void Resolve( int index ) const;
};
[/CODE] Edited by Robot Ninja
0

Share on other sites
Grrrr. Yet again, I hit "Quote" rather than "Edit". Sorry for the noise. Edited by e?dd
1

Share on other sites
[quote name='e?dd' timestamp='1355023116' post='5008681']
Does it need to be sorted?
[/quote]
That was a requirement from my professor. I don't understand the necessity of the values being sorted either, since I will have to check every node until I find the item I'm looking for anyways. The list need to be ordered by their keys, I believe. Keys are only ever numbers and strings, right?

[quote name='e?dd' timestamp='1355023116' post='5008681']
Well, you need a generic function of some kind that returns a hash for a key. You might use a traits mechanism to allow custom hashing for arbitrary types. Sensible default traits could be provided for common key types (strings, integers, etc).

The other thing you need is an equality comparison for the keys. operator== should suffice for starters, as it can be overloaded for arbitrary types.
[/quote]
Could you clarify what you mean by a "traits mechanism"? I'm unfamiliar with the concept. Perhaps a simple example would clear things up. I do indeed need an overloaded operator==(), only to compare the key held inside the data object though.

[quote name='e?dd' timestamp='1355023116' post='5008681']
Which get function? I don't see one.
[/quote]
I meant in the ListNode class, line 21. Sorry, I wrote my question on the wrong block of code.

[quote name='e?dd' timestamp='1355023116' post='5008681']
Do you want to expose the nodes? Or just the values they contain? By returning Node objects, the user can "accidentally" modify any next/prev pointers.

The only reason I can think of to provide my own node type would be for "intrusive links", but your SortedList doesn't seem to allow that (which is fine). Is the Node template parameter necessary?
[/quote]
I can't seem to decide, because I don't have enough experience to know if I would ever need to expose the node or not. When would exposing the node's data object be an issue in game programming? I definitely don't want the 'next' pointer to be handled by anything but the data structure it is serving (i.e. list).

[quote name='e?dd' timestamp='1355023116' post='5008681']
Quote
?
1
int Size() const;
Is int the best type to use here?
[/quote]
Should I be using something less expensive than that? Short int? Or something with higher resolution? Since hash tables apparently can get pretty crazy in size, I thought int would an okay data type for size, but please explain why you think otherwise. My brain is a sponge right now.

[quote name='e?dd' timestamp='1355023116' post='5008681']
Quote
?
1
T Pop();
I'm just going to post this slightly insidious link. It's probably a bit mean at this stage, but if you can follow it and realise why I've posted it after your Pop() method, you're doing well

I note that you haven't provided a copy constructor or assignment operator. If you don't want to do that at this stage, you should explicitly make your class uncopyable. The way you do this in C++98 (private, unimplemented methods) and C++11 (putting "= delete" at the end of the method declarations) is different though the former way will still work.
[/quote]
I will get back to you regarding the Pop() function after I finish reading the article. [img]http://public.gamedev.net//public/style_emoticons/default/tongue.png[/img]
A copy constructor/assignment operator for my list and hash table would be a good idea. [img]http://public.gamedev.net//public/style_emoticons/default/ohmy.png[/img] I seemed to have forgotten some of the basics. Thanks.

..."return m_count == 0"... lol duh

[quote name='e?dd' timestamp='1355023116' post='5008681']
Useful for debugging I suspect, but slightly odd (though maybe it was a requirement for the assignment?).

What if I want to comma-separate my output? What If I want to print it to a string or a file? If you provided an iteration interface, I could do this.
[/quote]
Yes a print function was a requirement for a previous assignment, but I just made a poor design decision at the time and should have left the print implementation outside the class.

[quote name='e?dd' timestamp='1355023116' post='5008681']
Technically a destructor call doesn't release the memory associated with *this, so the comment isn't entirely accurate.
[/quote]
I see. TIL++.

[quote name='e?dd' timestamp='1355023116' post='5008681']
/** Getter function.
* Returns a ptr to the object stored in the node.
*/
T* Data() const { return &m_data; }
IMHO, this should return a reference. In situations where it's possible for there not to be a value, I can understand the choice to return a pointer. But that's not the case here.
[/quote]
Wouldn't I have to feed the function an external variable then? I can't think of any other way to avoid making the function more complicated. Is the extra parameter just a necessary evil?

[quote name='e?dd' timestamp='1355023116' post='5008681']
Prefer 'private' to 'protected', for a couple of reasons.

First, it's easier to change things from private to protected than the other way around. If something starts out protected and later you decide you really did mean private, then you might have other code depending on that member which you'll now have to modify. However, if you start out with private, switching to protected won't break any other code. But do you ever envision that someone will actually inherit your ListNode class, which is afterall the only reason we'd ever want to make something protected? If the answer is 'no', then private is definitely the way to go. If that assumption is wrong (and I doubt it is), we can easily change to protected later. However...

'protected' is really much closer to 'public' than it is to 'private', as it still allows access to parts of your class. The code that's actually accessing the protected stuff is probably a different audience, but that's the only difference. So you shoud still try provide a well-thought-out interface, just like you do with public stuff, where implementation details are hidden to a practical degree.

In this particular case, you can already access the data through Data() and get/set the next node, so there's really no need for protected at all.

I think ListNode should probably be un-copyable.
[/quote]
Good point. That code was private initially, but I was being hopeful. Is making ListNode uncopyable just a 'correctness' thing like const, or is there some problem that can arise from it being copyable?

[quote name='e?dd' timestamp='1355023116' post='5008681']
public:
ChainedHashTable( int size = 11 );
11?
Is int the best type for the size?
Do you know about explicit constructors?
[/quote]
I still don't understand why I should use something other than an integer, and by explicit constructors you mean things like the copy and assignment constructors? What does that have to do with using a default parameter? 11 was some arbitrary prime number I chose to start the hash table with.

[quote name='e?dd' timestamp='1355023116' post='5008681']
Should key be passed by reference-to-const?
[/quote]
I suppose a ref to const wouldn't hurt, especially if the key is a string. Again, would the key be anything other than a number or string?

[quote name='e?dd' timestamp='1355023116' post='5008681']
Data* LookUp( Key key ) const;
Should key be passed by reference-to-const?
Should the return type be const-qualified?
[/quote]
I took the const after the function declaration to be a promise to the programmer that the the function will not modify the members of the class...So the fact that I'm returning a non-const pointer to the Data, makes it a non-const function?

[quote name='e?dd' timestamp='1355023116' post='5008681']
private:
std::vector *m_pTable;
Perhaps a typo, but do you really want a pointer to a vector, here?
[/quote]
I was unsure what is generally done, so I've been flip-flopping on whether or not to make the vector object reside inside the class, or pointed to by the class. I guess the general case is to just instantiate the object inside the class. Then again, I guess it's a moot point to point to the vector, since the vector is itself already pointing to a dynamic array.

[quote name='e?dd' timestamp='1355023116' post='5008681']
/** Collision resolution func.
* Inserts data into the hashed chain, in order.
* @param index Value of the index where the collision occurred.
*/
void Resolve( int index ) const;
I don't understand this function, or its description. Inserts what data into the hashed chain? Missing a parameter, perhaps?
Should the method be const?
[/quote]
It's supposed to be my collision resolution function. My plan was to have the function take in the index value where the collision occurred, and then insert the new object being allocated into the list existing at that index.

+50 to you sir for dissecting my code and critiquing the design. I understand that I have much to learn about proper and conventional coding, so I'm really glad that you devoted some of your time to help me out.
0

Share on other sites
[quote name='e?dd' timestamp='1355023398' post='5008682']
Grrrr. Yet again, I hit "Quote" rather than "Edit". Sorry for the noise.
[/quote]

Which part did you edit? You may have done it while I was writing a response.
*EDIT* Since you helped me out so much, I gave you a +1 on your random post. [img]http://public.gamedev.net//public/style_emoticons/default/happy.png[/img] Edited by Robot Ninja
0

Share on other sites
[quote name='e?dd' timestamp='1355033611' post='5008716']
It has no definition, but you can specialize it for types that need to work with the hash table:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
template<>
struct HashTraits
{
static std::size_t Hash(const std::string &s) { /* ... */ }
static bool Equal(const std::string &s1, const std::string &s2) { /* ... */ }
};

template<>
struct HashTraits
{
static std::size_t Hash(int i) { /* ... */ }
static bool Equal(int i1, int i2) { /* ... */ }
};

// and so on...
EDIT: changed use of uintptr_t to std::size_t.

Then, if the hash table needs to hash an object it does so by calling "HashTraits::Hash(key)". A similar thing is done for equality comparisons when searching buckets for matching keys. If there's no HashTraits specialization for the Key type available you get a compiler error, indicating that you need to write one. There are some other options in the design space, here. You might have two separate traits templates, one for the hash and one for equality. Or you might forgo the equality part altogether and just rely on the existence of an operator==.

Now, rather than using traits, you could also simply have the requirement that the user must ensure that there's a Hash() function that takes a Key and returns a hash value, rather like you have the operator< requirement for your SortedList. However, some care must be taken as e.g. Hash(const std::string &) might get called even if the Key type is "const char *", due to C++ overload resolution. This may not be the desired behaviour. Using the traits stuff avoids this problem because there has to be a template specialization for the exact type.

So there's a tradeoff between simplicity and correctness.

[/quote]
Thanks for the explanation on traits, it seems like a very useful strategy. Also I'm assuming you're saying that the more correct way is to require that there is an existing hash function for their key type? Would using a function pointer parameter with the hash function help solve issues with custom types, or is using traits the more common approach?

[quote name='e?dd' timestamp='1355033611' post='5008716']
Quote
e?dd, on 08 December 2012 - 07:18 PM, said:
/** Getter function.
* Returns a ptr to the object stored in the node.
*/
T* Data() const { return &m_data; }
IMHO, this should return a reference. In situations where it's possible for there not to be a value, I can understand the choice to return a pointer. But that's not the case here.
Wouldn't I have to feed the function an external variable then? I can't think of any other way to avoid making the function more complicated. Is the extra parameter just a necessary evil?
I Think we're getting our wires crossed. All I'm really suggesting is that the function be changed to "const T &Data() const { return m_data; }" or "T &Data() { return m_data; }". Actually, now that I look at it again, I'm surprised that the original function even compiles (as there should be a 'const' before the 'T *'). Is this Data() function used anywhere, at present?
[/quote]
I see. I didn't realize that doing something like T& Data() { return m_data; } was legal. From what I recalled about returning by reference, I thought that you had to do something like:
T& Data( T &data )
{
data = m_data;
return data;
}
I must have missed something...
Also, I haven't actually started compiling yet (bad idea?). I've just been designing the data structures I need for my assignment, so no, the Dat() function is not being used anywhere yet.

[quote name='e?dd' timestamp='1355033611' post='5008716']
Quote
by explicit constructors you mean things like the copy and assignment constructors?
No, there's a C++ keyword, "explicit".

At present, I can do weird stuff like this:
?
1
2
3
4
5
void foo(const ChainedHashTable &tab); // suppose this function exists

// ...

foo(42); // compiler allows this, as ChainedHashTable's constructor is not explicit!

This is the same mechanism that allows you to pass a "const char *" to a function that takes an std::string. For your constructor, the automatic conversion is probably not wanted!
[/quote]
[img]http://public.gamedev.net//public/style_emoticons/default/ohmy.png[/img] The compiler allows that? So this is basically a concern for any custom type, huh? Does the explicit keyword only apply to the constructor, including the copy constructor and assignment operator?

[quote name='e?dd' timestamp='1355033611' post='5008716']
Presumably the number of buckets is going to change during the life of a hash table, either automatically, or at the user's request(?).
[/quote]
Honestly now that I think about it, resizing the hash table after data has been added is probably a bad idea. It screws up the distribution of the data that the hashing algorithm is trying to avoid. So maybe I should just not include that option.

[quote name='e?dd' timestamp='1355033611' post='5008716']
Well, let me put is this way. You hand me a hash table, which you say is const. Should I be able to change its content (without resorting to nasty hacks)?

The idiomatic answer for C++ is no, I should not be able to change its content. Sure, the contents of the hash table aren't direct members of the structure, but that sounds like you're just trying to weasel your way out of keeping things const-correct ;)

What often happens is that two overloads are provided for things like this:
?
1
2
const Data* LookUp( Key key ) const;
Data* LookUp( Key key );

[/quote]
Okay, so why would anyone include an overloaded function that grants access to the data if there is a similar function that is trying to protect the data? It seems counter-intuitive to me.

[quote name='e?dd' timestamp='1355033611' post='5008716']
Up until this point you've got something approaching a very general hash table. So it's a bit weird to have something this specific here, IMHO. Is there any way you can move this specific function outside of the hash table?
[/quote]
Like I mentioned before in this post, my first inclination is to use function pointers, but your traits concept seems like a good solution as too.

Thanks again for replying to my questions. These changes will be very helpful. Using std::size_t instead of unsigned int or regular int for some reason was mind-blowing. [img]http://public.gamedev.net//public/style_emoticons/default/laugh.png[/img] The same goes for traits. +50 more to you!

*EDIT* What's with the quotes in my post? Unlike yours, the code boxes never copy over right. I've been using the "selective quote" feature. Edited by Robot Ninja
0

Share on other sites
[quote name='Robot Ninja' timestamp='1355091472' post='5008898']
Thanks for the explanation on traits, it seems like a very useful strategy. Also I'm assuming you're saying that the more correct way is to require that there is an existing hash function for their key type?
[/quote]
Of the stuff I mentioned, I prefer traits to having the user ensure an overloaded function is in scope, as it avoids accidental use of overloaded functions and eliminates the possibility of ODR violations (a somewhat technical subject to which I won't subject you unless provoked ).

[quote]
Would using a function pointer parameter with the hash function help solve issues with custom types, or is using traits the more common approach?
[/quote]
As ripoff said, function pointers tend to be slower as in general they aren't very transparent to a compiler.

However, there's another design option, which is to use function objects and add a third template parameter to your hash table. We first provide define a Hash<T> template class with some useful specializations:

[code]
template<typename T>
struct Hash;

template<>
struct Hash<std::string>
{
std::size_t operator() (const std::string &key) const { /* ... */ }
};

template<>
struct Hash<int>
{
std::size_t operator() (int key) const { /* ... */ }
};

// and so on.
[/code]

This is rather similar to the traits stuff so far, except that we need to create an instance of Hash<Key> and 'call' it to get a hash value, rather than calling a static function. Such an instance will be kept inside the hash table:

[code]
template<typename Key, typename Value, typename HashFunc = Hash<Key> >
class HashTable
{
public:
HashTable(const HashFunc &hash = HashFunc()) : buckets(/* ... */), hash(hash) { }

//...

private:
std::vector< ... > buckets;
HashFunc hash;
};
[/code]

Internally, whenever the HashTable needs to get the hash of a particular key, it 'calls' its hash object. This is preferable to the use of function pointers, as a compiler can better optimize calls to Hash<>::operator()s via inlining etc.

Note that we've given the HashFunc template parameter a 'sensible default'. This means the HashTable should work out the box for common key types. However, the user could provide their own type/object for hashing.

This is the kind of approach that's used in the standard library for containers. For example, std::map<K,V> uses std::less<K> to perform comparisons on keys by default, but you can plug in your own comparison function object if required. It's quite complicated (which is why I didn't mention it until now), but it does offer both flexibility and performance.
1

Share on other sites
[quote name='rip-off' timestamp='1355094479' post='5008911']
You can return a reference, provided the object being referenced outlives the function. For automatically allocated data (local variables), this is not the case. You can however, return references to global data, heap data or references that you have been passed. Member data also falls into this category, it becomes the caller's problem to ensure that the reference remains valid:
[/quote]
Just to make sure I understand what you're saying. Could I turn your bad() function to a good() one by doing this:
[CODE]
void good()
{
// Creating an Example instance so that the reference remains valid in this scope.
Example jim = Example("Jim");
const std::string &legal = jim.name();
// Works because the "Example" instance still exists here!

// And we're ALIVE...hooray!
std::cout << legal << std::endl;
}
[/CODE]

[quote name='rip-off' timestamp='1355094479' post='5008911']
If you do not resize, further insertions are going to cause more and more collisions, which could cause the hash table to degenerate towards it's worst case (a linked list). This is precisely the opposite behaviour of what client code is written to expect - it will be written to expect almost constant time access.
[/quote]
What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize? Rehashing the data while it was being copied over to a larger table COULD be a solution, but sounds like a painfully expensive operation. If there is a decent solution, is it best to leave the user/programmer using the class to manually resize the table, and/or implement an auto-resize mechanism like in std::vector? Lastly, game engines don't tend to use chaining do they, since it doesn't seem cache-friendly? 1-2-3-GO! [img]http://public.gamedev.net//public/style_emoticons/default/laugh.png[/img]

[quote name='rip-off' timestamp='1355094479' post='5008911']
Is this counter intutive:
[CODE]
void inspect(const std::vector &v) {
if(!v.empty()) {
std::cout << "The first element is: " << v[0] << std::endl;
}
}

void modify(std::vector &v) {
if(!v.empty()) {
v[0] = 42;
}
}
[/CODE]
std::vector::operator[] is overloaded for const and non const access, allowing both client functions to use it naturally.
[/quote]
I think I see what you're saying. At first it still seems like one non-const solution would work for both situations, but I'm assuming this is again a "correctness"/proper style thing. So the compiler will always for for the const function, unless the user wishes to change a value.

[quote name='e?dd' timestamp='1355096529' post='5008918']
However, there's another design option, which is to use function objects and add a third template parameter to your hash table. We first provide define a Hash template class with some useful specializations:

[CODE]
template
struct Hash;

template<>
struct Hash
{
std::size_t operator() (const std::string &key) const { /* ... */ }
};

template<>
struct Hash
{
std::size_t operator() (int key) const { /* ... */ }
};

// and so on.
[/CODE]

This is rather similar to the traits stuff so far, except that we need to create an instance of Hash and 'call' it to get a hash value, rather than calling a static function. Such an instance will be kept inside the hash table:

[CODE]
template >
class HashTable
{
public:
HashTable(const HashFunc &hash = HashFunc()) : buckets(/* ... */), hash(hash) { }

//...

private:
std::vector< ... > buckets;
HashFunc hash;
};
[/CODE]

Internally, whenever the HashTable needs to get the hash of a particular key, it 'calls' its hash object. This is preferable to the use of function pointers, as a compiler can better optimize calls to Hash<>::operator()s via inlining etc.

Note that we've given the HashFunc template parameter a 'sensible default'. This means the HashTable should work out the box for common key types. However, the user could provide their own type/object for hashing.

This is the kind of approach that's used in the standard library for containers. For example, std::map uses std::less to perform comparisons on keys by default, but you can plug in your own comparison function object if required. It's quite complicated (which is why I didn't mention it until now), but it does offer both flexibility and performance.
[/quote]
Very interesting. I might just have to implement this for my hash table then. Thanks! Geez...where can I learn more of this stuff!? [img]http://public.gamedev.net//public/style_emoticons/default/happy.png[/img]

Thank you both for extensively helping me with this, and allowing me to pick at your brains. It's been a great learning experience. I will go ahead and add traits for my hash table, and the additional template parameter that edd mentioned. [img]http://public.gamedev.net//public/style_emoticons/default/smile.png[/img]

*EDIT* Sorry, I referred to edd's last suggestion as traits, but with a second look I'm no longer sure. Is there a particular name for this kind of design? Edited by Robot Ninja
0

Share on other sites
[quote name='Robot Ninja' timestamp='1355109958' post='5008966']
Just to make sure I understand what you're saying. Could I turn your bad() function to a good() one by doing this:
[CODE]
void good()
{
// Creating an Example instance so that the reference remains valid in this scope.
Example jim = Example("Jim");
const std::string &legal = jim.name();
// Works because the "Example" instance still exists here!

// And we're ALIVE...hooray!
std::cout << legal << std::endl;
}
[/CODE]
[/quote]
Yes, though 'Example jim("Jim");' would be a more idiomatic way of writing the first line.

[quote]
What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize?
[/quote]
What makes you think it will be lop-sided? If it is, there's either a problem with your algorithm, or your hash function is poor.

[quote]
Rehashing the data while it was being copied over to a larger table COULD be a solution, but sounds like a painfully expensive operation.
[/quote]
It is the solution. And it is expensive. But it reduces the cost of each lookup. It's the difference between O(1) and O(n).

[quote]
If there is a decent solution, is it best to leave the user/programmer using the class to manually resize the table, and/or implement an auto-resize mechanism like in std::vector?
[/quote]
Up to you, but I tend to allow the specification of a load factor on construction, after which rehashing occurs. The default value might be 3, or something.

[quote]
Lastly, game engines don't tend to use chaining do they, since it doesn't seem cache-friendly? 1-2-3-GO! [img]http://public.gamedev.net//public/style_emoticons/default/laugh.png[/img]
[/quote]
Depends on how god your allocator is and how you make your chains. For example, in the past I've made a hash table where the buckets were linked lists, but each 'next' pointer was actually an index. In total I required two std::vector<>s as data members. One vector contained "kv_node's:

[code]
struct kv_node
{
static const std::size_t no_next_node = std::size_t(-1);

kv_node(const Key key, const Value value, std::size_t next) : key(key), value(value), next(next) { }

Key key;
Value value;
std::size_t next;
};
[/code]

And the other vector just had std::size_t's containing the index in to the first for the first node in each bucket.

I'm trying to remember how I did removal efficiently. I suspect I did something like swap the element being removed with the last element in the main vector, which would have required performing another hash. Or I might have kept a separate std::size_t in the data structure alongside the vectors indicating the head of a free-list.

[quote]
Very interesting. I might just have to implement this for my hash table then. Thanks! Geez...where can I learn more of this stuff!? [img]http://public.gamedev.net//public/style_emoticons/default/happy.png[/img]
[/quote]

http://www.amazon.co.uk/Accelerated-Practical-Programming-Example-Series/dp/020170353X/ref=sr_1_1?ie=UTF8&qid=1355278925&sr=8-1
http://www.amazon.co.uk/Standard-Library-Tutorial-Reference/dp/0321623215/ref=sr_1_1?s=books&ie=UTF8&qid=1355279152&sr=1-1
http://www.amazon.co.uk/C-Programming-Language-Special/dp/0201700735/ref=sr_1_2?s=books&ie=UTF8&qid=1355278943&sr=1-2
http://www.amazon.co.uk/Exceptional-C-Herb-Sutter/dp/0201615622/ref=sr_1_1?s=books&ie=UTF8&qid=1355278967&sr=1-1
http://www.amazon.co.uk/More-Exceptional-Engineering-Programming-Solutions/dp/020170434X/ref=sr_1_3?s=books&ie=UTF8&qid=1355278967&sr=1-3
http://www.amazon.co.uk/Exceptional-Style-Engineering-Programming-Solutions/dp/0201760428/ref=sr_1_4?s=books&ie=UTF8&qid=1355278967&sr=1-4
http://www.amazon.co.uk/Effective-Specific-Programs-Professional-Computing/dp/0321334876/ref=pd_bxgy_b_img_y
http://www.amazon.co.uk/More-Effective-Programs-Professional-Computing/dp/020163371X/ref=pd_bxgy_b_img_y
http://www.amazon.co.uk/Effective-STL-Specific-Professional-Computing/dp/0201749629/ref=pd_bxgy_b_img_z
http://www.amazon.co.uk/Modern-Design-Applied-Generic-Patterns/dp/0201704315/ref=pd_sim_b_6

And lots of practice
1

Share on other sites
[quote name='e?dd' timestamp='1355279174' post='5009655']
[quote name='Robot Ninja']
What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize?
[/quote]
What makes you think it will be lop-sided? If it is, there's either a problem with your algorithm, or your hash function is poor.
[/quote]
It would depend on what the user would want to load factor to be before resize I suppose. I was saying that the distribution would be lopsided if the load factor was already high (e.g. .8) before a resize, once you do resize (assuming you do not rehash) then the distribution of data in the table would be lopsided. However since the solution IS to rehash, then that situation is no longer relevant.

[quote name='e?dd' timestamp='1355279174' post='5009655']
Up to you, but I tend to allow the specification of a load factor on construction, after which rehashing occurs. The default value might be 3, or something.
[/quote]
Sounds good.

[quote name='e?dd' timestamp='1355279174' post='5009655']
Depends on how god your allocator is and how you make your chains. For example, in the past I've made a hash table where the buckets were linked lists, but each 'next' pointer was actually an index. In total I required two std::vector<>s as data members. One vector contained "kv_node's:

[CODE]
struct kv_node
{
static const std::size_t no_next_node = std::size_t(-1);

kv_node(const Key key, const Value value, std::size_t next) : key(key), value(value), next(next) { }

Key key;
Value value;
std::size_t next;
};
[/CODE]

And the other vector just had std::size_t's containing the index in to the first for the first node in each bucket.

I'm trying to remember how I did removal efficiently. I suspect I did something like swap the element being removed with the last element in the main vector, which would have required performing another hash. Or I might have kept a separate std::size_t in the data structure alongside the vectors indicating the head of a free-list.
[/quote]
So this hash table acts like one with chaining in the way it looks up data and inserts data (partially), but instead of in a traditional linked list, the data is actually in one big bucket? So I'm curious - what's the benefit of this design over a regular, non-chained hash table then? I'm sorry if I'm asking too many questions, you might be tired of answering all of them. I won't hold it against you if you want to move on, lol.

[quote name='e?dd' timestamp='1355279174' post='5009655']
[url="http://www.amazon.co...55278925&sr=8-1"]http://www.amazon.co...55278925&sr=8-1[/url]
[url="http://www.amazon.co...55279152&sr=1-1"]http://www.amazon.co...55279152&sr=1-1[/url]
[url="http://www.amazon.co...55278943&sr=1-2"]http://www.amazon.co...55278943&sr=1-2[/url]
[url="http://www.amazon.co...55278967&sr=1-1"]http://www.amazon.co...55278967&sr=1-1[/url]
[url="http://www.amazon.co...55278967&sr=1-3"]http://www.amazon.co...55278967&sr=1-3[/url]
[url="http://www.amazon.co...55278967&sr=1-4"]http://www.amazon.co...55278967&sr=1-4[/url]
[url="http://www.amazon.co...pd_bxgy_b_img_y"]http://www.amazon.co...pd_bxgy_b_img_y[/url]
[url="http://www.amazon.co...pd_bxgy_b_img_y"]http://www.amazon.co...pd_bxgy_b_img_y[/url]
[url="http://www.amazon.co...pd_bxgy_b_img_z"]http://www.amazon.co...pd_bxgy_b_img_z[/url]
[url="http://www.amazon.co.../ref=pd_sim_b_6"]http://www.amazon.co.../ref=pd_sim_b_6[/url]

And lots of practice
[/quote]
Wow, excellent. These sound like the perfect kind of books. Thanks! [img]http://public.gamedev.net//public/style_emoticons/default/biggrin.png[/img]
0

Share on other sites
[quote name='Robot Ninja' timestamp='1355349998' post='5009983']
So this hash table acts like one with chaining in the way it looks up data and inserts data (partially), but instead of in a traditional linked list, the data is actually in one big bucket?
[/quote]
That's pretty much it, yeah.

[quote]
So I'm curious - what's the benefit of this design over a regular, non-chained hash table then?
[/quote]
Better locality, fewer allocations and simpler/faster iteration over all elements (the iterators are just wrappers around vector iterators). The memory footprint's a bit higher, though.
1

Share on other sites
One last thing I wanted to mention. The reason why my professor recommended using sorted linked lists over unsorted lists is because in sorted linked lists, look-ups for values that don't exist can be halted early when we have encountered a "greater" value. It still seems like it wouldn't make much of a difference though since we would want to resize the hash table before it got too big anyways...
0

Share on other sites
[quote name='Robot Ninja' timestamp='1355451237' post='5010438']
One last thing I wanted to mention. The reason why my professor recommended using sorted linked lists over unsorted lists is because in sorted linked lists, look-ups for values that don't exist can be halted early when we have encountered a "greater" value.
[/quote]
Makes sense, but when you insert, you have to iterate over half of the elements in a bucket (on average) too.

[quote]
It still seems like it wouldn't make much of a difference though since we would want to resize the hash table before it got too big anyways...
[/quote]
Agreed
0

Share on other sites
I had to revive this thread because I have some further questions about designing a (chained) hash table. In my program I will be keeping track of integer values inside the hash table, the keys being std::string objects. Since I have to do look ups using a key, what should I be inserting into the table? My first thought was to give my insert function a Key parameter, and a Data parameter, but then it seemed like I would have to keep track of the key and data separately because my linked list class used for chaining only keeps track of one object. So naturally my second thought was to give the table a struct to work with. This struct simply contained the key (std::string) used for hashing, and the data (int) that would be stored in the table. So that makes the user responsible for packaging their key and data before calling the table's Insert function. Is this how it is usually done?

*EDIT* ...or maybe I should redesign my list class to sort by an external key, rather than the actual stored in the list? I could create another node struct that stored a key and the data. Edited by Robot Ninja
0

Share on other sites
So naturally my second thought was to give the table a struct to work with. This struct simply contained the key (std::string) used for hashing, and the data (int) that would be stored in the table.

This is the way to go.

[quote]

So that makes the user responsible for packaging their key and data before calling the table's Insert function. Is this how it is usually done?
[/quote]

You can do the packaging in the insertion method, on their behalf.

0

Share on other sites
*EDIT* The website upgrade must have made this reply's format go crazy. So I reposted below, and it looks about right. Edited by Robot Ninja
0

Share on other sites

So that makes the user responsible for packaging their key and data before calling the table's Insert function. Is this how it is usually done?

You can do the packaging in the insertion method, on their behalf.

Unless there's something I'm not seeing, I couldn't figure out how to do that without dynamic allocation of the object. Is that the only way? I don't know if I really want my sorted list to hold the addresses rather than the actual objects themselves. :/

Just for reference, here are the data members I have in my hash table class:
std::size_t m_tableSize;
std::size_t m_loadThreshold;			// Tracks number of items in table.
float m_loadFactor;				// Holds threshold value for auto table resize.
SortedList<HashContainer<Key, Data>> *m_pTable;	// Table of sorted buckets/chains.


Here is my constructor definition:
template <typename Key, typename Data, typename HashFunc>
ChainedHashTable<Key, Data, HashFunc>::ChainedHashTable( std::size_t size /*= 37 */,
const HashFunc &hashFunc = HashFunc() )
m_pTable(new SortedList<HashContainer<Key, Data>>[size])
{

}


Here is my hash container:
template <typename Key, typename Data>
struct HashContainer
{
public:
/** Constructor.
*	@param key Key object used by hashing function.
*	@param data Data object stored in the hash table.
*/
HashContainer( const Key &key, const Data &data )
: m_key(key), m_data(data) {}

/** Destructor.
*/
~HashContainer();

/** Getter func.
*	Returns a const reference to the key.
*/
const Key& GetKey() { return m_key; }

/** Getter func.
*	Returns a reference (const/non-const) to the data object stored inside.
*/
const Data& GetData() const { return m_data; }
Data& GetData() { return m_data; }

/** Operator ==
*	Returns true if this key is equal to the other key.
*/
bool operator==( const HashContainer &hc ) { return this->GetKey() == hc.GetKey(); }
bool operator==( const Key &key ) { return this->GetKey() == key; }

/** Operator !=
*	Returns true if this key is not equal to the other key.
*/
bool operator!=( const HashContainer &hc ) { return this->GetKey() != hc.GetKey(); }
bool operator!=( const Key &key ) { return this->GetKey() != key; }

/** Operator <
*	Returns true if this key is less than the other key.
*/
bool operator<( const HashContainer &hc ) { return this->GetKey() < hc.GetKey(); }
bool operator<( const Key &key ) { return this->GetKey() > key; }

/** Operator >
*	Returns true if this key is greater than the other key.
*/
bool operator>( const HashContainer &hc ) { return this->GetKey() > hc.GetKey(); }
bool operator>( const Key &key ) { return this->GetKey() > key; }

private:
Key m_key;
Data m_data;
};

0

Share on other sites

You don't need any dynamic allocation (beyond what's required for creating a ListNode for the SortedList). Could you explain why you think you need to do any more than that?

1

Share on other sites
You don't need any dynamic allocation (beyond what's required for creating a ListNode for the SortedList). Could you explain why you think you need to do any more than that?

This is what my table's insert code looks like right now:
template <typename Key, typename Data, typename HashFunc>
void ChainedHashTable<Key, Data, HashFunc>::Insert( const HashContainer<Key, Data> &container )
{
/* ...Table resize code can go here... */

// Hash an appropriate index.
std::size_t index = Normalize( hash(container.GetKey()) );

// Insert the key and data into the hash table.
// SortedList object will deal with sorting and inserting into the chain.
m_pTable[index].Insert(container);
}


This is my list's Insert code:
template <typename T>
void SortedList<T>::Insert( const T &item )
{
// Make new node the front if it has the 'lowest' value, or list is empty
if (m_pFront == nullptr || item < m_pFront->Data())
m_pFront = new ListNode(item, m_pFront);
else
{
ListNode *pLinkingNode = m_pFront; // newNode will be pointed to by this.
// Keep iterating until we find the 'in-order' pos, until we hit the end.
{
}

// Insert the node in its appropriate position
ListNode *pNewNode = new ListNode(item, pLinkingNode->Next());
}

m_count++;
}


My list only takes one parameter, so what I feed it has to be contained in one object. ...[moment of realization]...Wait, so would something like this work:
template <typename Key, typename Data, typename HashFunc>
void ChainedHashTable<Key, Data, HashFunc>::Insert( const Key &key, const Data &data )
{
/* ...Table resize code can go here... */

// Hash an appropriate index.
std::size_t index = Normalize( hash(key) );

// Insert the key and data into the hash table.
// SortedList object will deal with sorting and inserting into the chain.
m_pTable[index].Insert(HashContainer(key, data));
}

0

Share on other sites

Wait, so would something like this work:

template <typename Key, typename Data, typename HashFunc>
void ChainedHashTable<Key, Data, HashFunc>::Insert( const Key &key, const Data &data )
{
/* ...Table resize code can go here... */

// Hash an appropriate index.
std::size_t index = Normalize( hash(key) );

// Insert the key and data into the hash table.
// SortedList object will deal with sorting and inserting into the chain.
m_pTable[index].Insert(HashContainer(key, data));
}


Yes :)

1

Share on other sites
Thank you edd for helping me out so much. I feel like I should buy you a small gift for the holidays. lol

I think the design for my hash table is almost complete - I just have to clean up compiler errors here and there. I've been learning from some template programming mistakes as I go through my code. There is one error that I'm encountering right now that has be a little bewildered:

error C2678: binary '<' : no operator found which takes a left-hand operand of type 'const HashContainer<Key,Data>' (or there is no acceptable conversion)

It's pointing to one of the beginning lines of my list's Insert code:
if (m_pFront == nullptr || item < m_pFront->GetData())


However, I'm pretty sure I have the appropriate operator< overload inside my HashContainer class. Do you or anyone else have any idea what it could be?

*EDIT* I figured out what the problem was. My GetKey function was returning a const reference to the key, but I did not append the const keyword at the end of the function declaration. However, the last compiler error I have now is this hot mess:

error LNK2019: unresolved external symbol "public: __thiscall HashContainer<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int>::~HashContainer<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int>(void)" (??1?$HashContainer@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@H@@QAE@XZ) referenced in function "public: void __thiscall ChainedHashTable<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int,struct Hash<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >::Insert(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,int const &)" (?Insert@?$ChainedHashTable@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@HU?$Hash@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@@@@QAEXABV?$basic_string@DU?$char_traits@D@std@@V?\$allocator@D@2@@std@@ABH@Z)
1>MSVCRTD.lib(crtexe.obj) : error LNK2019: unresolved external symbol _main referenced in function ___tmainCRTStartup
1>C:\Users\Nova\Dropbox\ThachDev\~C++\Data_Structures\Debug\BORG_Language.exe : fatal error LNK1120: 2 unresolved externals Edited by Robot Ninja
0

Share on other sites
error LNK2019: unresolved external symbol "public: __thiscall HashContainer<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int>::~HashContainer<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int>(void)"

The trick with error messages like this is to resist intimidation and the urge to give up. If we replace "std::basic_string<char, struct std::char_traits<char>, class std::allocator<char> >" with "std::string", this message should become a bit clearer:

[quote]error LNK2019: unresolved external symbol "public: __thiscall HashContainer<std::string,int>::~HashContainer<std::string,int>(void)"[/quote]

In other words, the linker can't find a definition for your HashContainer's destructor. Have you provided one?

1

Share on other sites

error LNK2019: unresolved external symbol "public: __thiscall HashContainer<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int>::~HashContainer<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int>(void)"

The trick with error messages like this is to resist intimidation and the urge to give up. If we replace "std::basic_string<char, struct std::char_traits<char>, class std::allocator<char> >" with "std::string", this message should become a bit clearer:

error LNK2019: unresolved external symbol "public: __thiscall HashContainer<std::string,int>::~HashContainer<std::string,int>(void)"

In other words, the linker can't find a definition for your HashContainer's destructor. Have you provided one?

Ah I see, that makes sense. Okay, so I replaced the semicolon at the end of my destructor's declaration and replaced it empty brackets. I also realized that I named my entry point "Main" instead of just "main", which explains the "tmainCRTStartup" error. Thanks so much for your help. My program compiles successfully. Now I just have to test my code logic. Seriously, I need to buy you a beer or something.
0

Create an account

Register a new account