Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


[C++] Problems designing a proper hash table w/ chaining


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
25 replies to this topic

#1 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 08 December 2012 - 07:41 PM

Hey Guys,

I'm trying to design a my own hash table data structure using chaining, but my main issue is designing flexible components that can serve my hash table class. Before anyone asks why I'm trying to make my own hash table, this is part of an assignment for one of my classes, and I figured that a custom hash table would help in future game projects (I heard hash tables were commonly used for managing game assets).

My hash table contains a sorted linked list class that I also created for a previous assignment, but I'm trying to figure out how I would deal with a look up in the hash table. My first idea was to implement a find function in my list class, but I can't figure out how to design it so that it is flexible enough for my hash table to use (i.e. if my hash table decided to do a look up with a string rather instead of a number, and vice versa). Please feel free to critique my class design. This is all a learning process so I would love to hear from more experienced programmers.

*EDIT* Just added a "Front()" function to provide access to the first node in the list.

Here is my list class. My get function should be returning a ptr to the data object, right?:
/************************************************************************
* SortedList class template.
* Will manage and sort inherent C++ types.
* *NOTE* Overloaded operator<() must be available in custom
* class objects in order for sorting to work properly.
************************************************************************/
template <typename T, typename Node = ListNode<T>>
class SortedList
{
public:
/** Class constructor.
  */
SortedList();

/** Class destructor.
  * Deallocates heap memory generated by the class in addition to
  * destroying the class instance.
  */
~SortedList();

/** List manipulator.
  *  Inserts an item to the list in sorted order.
  *  @param item An item of the same type as the list.
  */
void Insert(const T &newItem);

/** Iterator
  * Returns a ptr to the beginning of the list.
  */
Node* Front();

/** List manipulator.
  * Empties the entire list i.e. pops off all items.
  */
void Clear();

/** Utility function.
  * Returns true if the list is empty, else returns false.
  */
bool IsEmpty() const;

/** Utility function.
  * Returns the number of elements in the list.
  */
int Size() const;

private:
unsigned int m_count; ///< Keeps track of how many items in the list.
Node<T> *m_pFront;   ///< Points to the front of the list.

/** List manipulator.
  * Takes a an item off the front of the list, and returns it.
  * @throw (char*) Throws a char* exception if called against an empty list.
  */
T Pop();
};

/************************************************************************
* Method Definitions
************************************************************************/
template <typename T, typename Node>
SortedList<T, Node>::SortedList()  : m_count(0), m_pFront(nullptr) {}

template <typename T, typename ListNode>
SortedList<T, Node>::~SortedList()
{
Clear();
}

template <typename T, typename Node>
void SortedList<T, Node>::Insert(const T &newItem)
{
// Make new node the front if it has the 'lowest' value, or list is empty
if (m_pFront == nullptr || newItem < *(m_pFront->Data()))
  m_pFront = new Node(newItem, m_pFront);
else
{
  Node *pLinkingNode = m_pFront; // newNode will be pointed to by this.
  // Keep iterating until we find the 'in-order' pos, until we hit the end.
  while (pLinkingNode->Next() != nullptr &&
   *(pLinkingNode->Next()->Data()) < newItem )
  {
   pLinkingNode = pLinkingNode->Next();
  }
  // Insert the node in its appropriate position
  Node *pNewNode = new Node(newItem, pLinkingNode->Next());
  pLinkingNode->SetNext(pNewNode);
}
m_count++;
}

template <typename T, typename Node>
inline Node* SortedList<T, Node>::Front() const
{
return m_pFront;
}

template <typename T, typename Node>
void SortedList<T, Node>::Clear()
{
while (m_count != 0)
  Pop();
}

template <typename T, typename Node>
inline bool SortedList<T, Node>::IsEmpty() const
{
if (m_count == 0)
  return true;
else
  return false;
}

template <typename T, typename Node>
inline int SortedList<T, Node>::Size() const
{
return m_count;
}

template <typename T, typename Node>
void SortedList<T, Node>::Print() const
{
if (m_count == 0)
{
  std::cout << "There are no items in the list.\n";
}
else
{
  Node* pCurrentNode = m_pFront;
  for (unsigned int i = 0; i < m_count; i++)
  {
   std::cout << *(pCurrentNode->Data()) << std::endl;
   pCurrentNode = pCurrentNode->Next();
  }
}
}

template <typename T, typename Node>
T SortedList<T, Node>::Pop()
{
// Proceed with pop unless the list is empty
if (m_count > 0)
{
  T tempItem = *(m_pFront->Data());  // Hold item to be returned
  // Preserve pointer to node after the front and delete the current front
  Node *pTempNode = m_pFront;
  m_pFront = m_pFront->Next();
  delete pTempNode;
  m_count--;
  return tempItem;
}
else
{
  throw std::runtime_error("Error: Cannot pop an item from an empty list!");
}
}

Here is my default node structure for my list (Is it overkill to encapsulate the data members of the node?):
template <typename T>
class ListNode
{
public:
/** Class constructor.
  * @param item  Object to store inside the node.
  * @param pNext Ptr to the following node.
  */
ListNode(const T &item, ListNode *pNext = nullptr)
  : m_data(&item), m_pNext(pNext) {}

/** Class destructor
  * Deallocates heap memory generated by the class in addition to
  * destroying the class instance.
  */
~ListNode() {}

/** Getter function.
  * Returns a ptr to the object stored in the node.
  */
T* Data() const { return &m_data; }

/** Getter function.
  * Returns a ptr to the next node.
  */
ListNode* Next() const { return m_pNext; }

/** Setter function.
  * Changes the next node pointed to.
  * @param pNext Ptr to new node.
  */
void SetNext(ListNode *pNext) { m_pNext = pNext; }

protected:
T m_data;   ///< Stores node's value.
ListNode *m_pNext; ///< Points to next node, if any.
};

Lastly, here is the current design of my hash table class. I haven't actually implemented anything here yet, since I'm still redesigning the list class:
template <typename Data, typename Key>
class ChainedHashTable
{
public:
ChainedHashTable( int size = 11 );
~ChainedHashTable();

/** Hashing func.
  * Returns the hash value according to a hash key.
  * @param key The seed value used to hash.
  */
int Hash( Key key ) const;

/** Insertion func.
  * Use this to add items to the hash table.
  * @param item Const reference to object being added to the table.
  */
void Insert( const Data &data, Key key );

/** Search func.
  * Returns the index value of the object in question.
  */
Data* LookUp( Key key ) const;

private:
std::vector<SortedList> *m_pTable;

/** Collision resolution func.
  * Inserts data into the hashed chain, in order.
  * @param index Value of the index where the collision occurred.
  */
void Resolve( int index ) const;
};

Edited by Robot Ninja, 08 December 2012 - 08:04 PM.


Sponsor:

#2 e‍dd   Members   -  Reputation: 2105

Like
5Likes
Like

Posted 08 December 2012 - 09:18 PM

My hash table contains a sorted linked list class that I also created for a previous assignment,

Does it need to be sorted? If the load factor is low enough (or adaptive/user-modifyable), this is quite possibly a performance pessimization. It shouldn't affect correctness though, but it does mean any key that is put in to the table needs to support operator<, std::less, or whatever your sorted list uses. This requirement is abnormal for a hash table, at least in my experience.

but I'm trying to figure out how I would deal with a look up in the hash table.

Well, you need a generic function of some kind that returns a hash for a key. You might use a traits mechanism to allow custom hashing for arbitrary types. Sensible default traits could be provided for common key types (strings, integers, etc). If you don't know what a "traits mechanism" is, feel free to ask.

The other thing you need is an equality comparison for the keys. operator== should suffice for starters, as it can be overloaded for arbitrary types.

My first idea was to implement a find function in my list class, but I can't figure out how to design it so that it is flexible enough for my hash table to use (i.e. if my hash table decided to do a look up with a string rather instead of a number, and vice versa).

Well you only need to find by comparing keys. Again, using operator== is probably a good starting point. Although, presumably you'll need to keep a sorted list of something like std::pair<Key, Value> and using operator== on a pair won't do the right thing.

If your list wasn't sorted, you could expose an iterator interface, like std::list, so that you can use std::find_if() with an appropriate predicate. But because it is sorted, I can see the benefit of providing a find_if() method. It could take a predicate, like the generic std::find_if() function.

Here is my list class. My get function should be returning a ptr to the data object, right?:

Which get function? I don't see one.

I skim-read the code. Here are some comments.

template <typename T, typename Node = ListNode<T>>
class SortedList
{
public:
[...]
Node* Front();


Do you want to expose the nodes? Or just the values they contain? By returning Node objects, the user can "accidentally" modify any next/prev pointers.

The only reason I can think of to provide my own node type would be for "intrusive links", but your SortedList doesn't seem to allow that (which is fine). Is the Node template parameter necessary?

int Size() const;

Is int the best type to use here?

T Pop();

I'm just going to post this slightly insidious link. It's probably a bit mean at this stage, but if you can follow it and realise why I've posted it after your Pop() method, you're doing well :)

I note that you haven't provided a copy constructor or assignment operator. If you don't want to do that at this stage, you should explicitly make your class uncopyable. The way you do this in C++98 (private, unimplemented methods) and C++11 (putting "= delete" at the end of the method declarations) is different though the former way will still work.

template <typename T, typename Node>
inline bool SortedList<T, Node>::IsEmpty() const
{
if (m_count == 0)
  return true;
else
  return false;
}


Personally, I find "return m_count == 0;" to be more idiomatic, but that's a minor thing.

template <typename T, typename Node>
void SortedList<T, Node>::Print() const
{
if (m_count == 0)
{
  std::cout << "There are no items in the list.\n";
}
else
{
  Node* pCurrentNode = m_pFront;
  for (unsigned int i = 0; i < m_count; i++)
  {
   std::cout << *(pCurrentNode->Data()) << std::endl;
   pCurrentNode = pCurrentNode->Next();
  }
}
}

Useful for debugging I suspect, but slightly odd (though maybe it was a requirement for the assignment?).

What if I want to comma-separate my output? What If I want to print it to a string or a file? If you provided an iteration interface, I could do this.

Here is my default node structure for my list (Is it overkill to encapsulate the data members of the node?):

template <typename T>
class ListNode
{
public:
/** Class constructor.
  * @param item  Object to store inside the node.
  * @param pNext Ptr to the following node.
  */
ListNode(const T &amp;item, ListNode *pNext = nullptr)
  : m_data(&amp;item), m_pNext(pNext) {}

/** Class destructor
  * Deallocates heap memory generated by the class in addition to
  * destroying the class instance.
  */
~ListNode() {}

Technically a destructor call doesn't release the memory associated with *this, so the comment isn't entirely accurate.

/** Getter function.
  * Returns a ptr to the object stored in the node.
  */
T* Data() const { return &amp;m_data; }

IMHO, this should return a reference. In situations where it's possible for there not to be a value, I can understand the choice to return a pointer. But that's not the case here.

protected:
T m_data;   ///< Stores node's value.
ListNode *m_pNext; ///< Points to next node, if any.

Prefer 'private' to 'protected', for a couple of reasons.

First, it's easier to change things from private to protected than the other way around. If something starts out protected and later you decide you really did mean private, then you might have other code depending on that member which you'll now have to modify. However, if you start out with private, switching to protected won't break any other code. But do you ever envision that someone will actually inherit your ListNode class, which is afterall the only reason we'd ever want to make something protected? If the answer is 'no', then private is definitely the way to go. If that assumption is wrong (and I doubt it is), we can easily change to protected later. However...

'protected' is really much closer to 'public' than it is to 'private', as it still allows access to parts of your class. The code that's actually accessing the protected stuff is probably a different audience, but that's the only difference. So you shoud still try provide a well-thought-out interface, just like you do with public stuff, where implementation details are hidden to a practical degree.

In this particular case, you can already access the data through Data() and get/set the next node, so there's really no need for protected at all.

I think ListNode should probably be un-copyable.

template <typename Data, typename Key>
class ChainedHashTable

It's more common to have the key appear first, though no biggie.

{
public:
ChainedHashTable( int size = 11 );

11?
Is int the best type for the size?
Do you know about explicit constructors?

int Hash( Key key ) const;

Is int the best type for the return value?
Should key be passed by reference-to-const?

void Insert( const Data &amp;data, Key key );

Again, it's normal for the key to come first.
Should key be passed by reference-to-const?

Data* LookUp( Key key ) const;

Should key be passed by reference-to-const?
Should the return type be const-qualified?

private:
std::vector<SortedList> *m_pTable;

Perhaps a typo, but do you really want a pointer to a vector, here?

/** Collision resolution func.
  * Inserts data into the hashed chain, in order.
  * @param index Value of the index where the collision occurred.
  */
void Resolve( int index ) const;

I don't understand this function, or its description. Inserts what data into the hashed chain? Missing a parameter, perhaps?
Should the method be const?

Edited by e‍dd, 08 December 2012 - 09:26 PM.


#3 e‍dd   Members   -  Reputation: 2105

Like
1Likes
Like

Posted 08 December 2012 - 09:23 PM

Grrrr. Yet again, I hit "Quote" rather than "Edit". Sorry for the noise.

Edited by e‍dd, 08 December 2012 - 09:24 PM.


#4 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 08 December 2012 - 10:52 PM

Does it need to be sorted?

That was a requirement from my professor. I don't understand the necessity of the values being sorted either, since I will have to check every node until I find the item I'm looking for anyways. The list need to be ordered by their keys, I believe. Keys are only ever numbers and strings, right?

Well, you need a generic function of some kind that returns a hash for a key. You might use a traits mechanism to allow custom hashing for arbitrary types. Sensible default traits could be provided for common key types (strings, integers, etc).

The other thing you need is an equality comparison for the keys. operator== should suffice for starters, as it can be overloaded for arbitrary types.

Could you clarify what you mean by a "traits mechanism"? I'm unfamiliar with the concept. Perhaps a simple example would clear things up. I do indeed need an overloaded operator==(), only to compare the key held inside the data object though.

Which get function? I don't see one.

I meant in the ListNode class, line 21. Sorry, I wrote my question on the wrong block of code.

Do you want to expose the nodes? Or just the values they contain? By returning Node objects, the user can "accidentally" modify any next/prev pointers.

The only reason I can think of to provide my own node type would be for "intrusive links", but your SortedList doesn't seem to allow that (which is fine). Is the Node template parameter necessary?

I can't seem to decide, because I don't have enough experience to know if I would ever need to expose the node or not. When would exposing the node's data object be an issue in game programming? I definitely don't want the 'next' pointer to be handled by anything but the data structure it is serving (i.e. list).

Quote
?
1
int Size() const;
Is int the best type to use here?

Should I be using something less expensive than that? Short int? Or something with higher resolution? Since hash tables apparently can get pretty crazy in size, I thought int would an okay data type for size, but please explain why you think otherwise. My brain is a sponge right now.

Quote
?
1
T Pop();
I'm just going to post this slightly insidious link. It's probably a bit mean at this stage, but if you can follow it and realise why I've posted it after your Pop() method, you're doing well

I note that you haven't provided a copy constructor or assignment operator. If you don't want to do that at this stage, you should explicitly make your class uncopyable. The way you do this in C++98 (private, unimplemented methods) and C++11 (putting "= delete" at the end of the method declarations) is different though the former way will still work.

I will get back to you regarding the Pop() function after I finish reading the article. Posted Image
A copy constructor/assignment operator for my list and hash table would be a good idea. Posted Image I seemed to have forgotten some of the basics. Thanks.

..."return m_count == 0"... lol duh

Useful for debugging I suspect, but slightly odd (though maybe it was a requirement for the assignment?).

What if I want to comma-separate my output? What If I want to print it to a string or a file? If you provided an iteration interface, I could do this.

Yes a print function was a requirement for a previous assignment, but I just made a poor design decision at the time and should have left the print implementation outside the class.

Technically a destructor call doesn't release the memory associated with *this, so the comment isn't entirely accurate.

I see. TIL++.

/** Getter function.
* Returns a ptr to the object stored in the node.
*/
T* Data() const { return &m_data; }
IMHO, this should return a reference. In situations where it's possible for there not to be a value, I can understand the choice to return a pointer. But that's not the case here.

Wouldn't I have to feed the function an external variable then? I can't think of any other way to avoid making the function more complicated. Is the extra parameter just a necessary evil?

Prefer 'private' to 'protected', for a couple of reasons.

First, it's easier to change things from private to protected than the other way around. If something starts out protected and later you decide you really did mean private, then you might have other code depending on that member which you'll now have to modify. However, if you start out with private, switching to protected won't break any other code. But do you ever envision that someone will actually inherit your ListNode class, which is afterall the only reason we'd ever want to make something protected? If the answer is 'no', then private is definitely the way to go. If that assumption is wrong (and I doubt it is), we can easily change to protected later. However...

'protected' is really much closer to 'public' than it is to 'private', as it still allows access to parts of your class. The code that's actually accessing the protected stuff is probably a different audience, but that's the only difference. So you shoud still try provide a well-thought-out interface, just like you do with public stuff, where implementation details are hidden to a practical degree.

In this particular case, you can already access the data through Data() and get/set the next node, so there's really no need for protected at all.

I think ListNode should probably be un-copyable.

Good point. That code was private initially, but I was being hopeful. Is making ListNode uncopyable just a 'correctness' thing like const, or is there some problem that can arise from it being copyable?

public:
ChainedHashTable( int size = 11 );
11?
Is int the best type for the size?
Do you know about explicit constructors?

I still don't understand why I should use something other than an integer, and by explicit constructors you mean things like the copy and assignment constructors? What does that have to do with using a default parameter? 11 was some arbitrary prime number I chose to start the hash table with.

Should key be passed by reference-to-const?

I suppose a ref to const wouldn't hurt, especially if the key is a string. Again, would the key be anything other than a number or string?

Data* LookUp( Key key ) const;
Should key be passed by reference-to-const?
Should the return type be const-qualified?

I took the const after the function declaration to be a promise to the programmer that the the function will not modify the members of the class...So the fact that I'm returning a non-const pointer to the Data, makes it a non-const function?

private:
std::vector *m_pTable;
Perhaps a typo, but do you really want a pointer to a vector, here?

I was unsure what is generally done, so I've been flip-flopping on whether or not to make the vector object reside inside the class, or pointed to by the class. I guess the general case is to just instantiate the object inside the class. Then again, I guess it's a moot point to point to the vector, since the vector is itself already pointing to a dynamic array.

/** Collision resolution func.
* Inserts data into the hashed chain, in order.
* @param index Value of the index where the collision occurred.
*/
void Resolve( int index ) const;
I don't understand this function, or its description. Inserts what data into the hashed chain? Missing a parameter, perhaps?
Should the method be const?

It's supposed to be my collision resolution function. My plan was to have the function take in the index value where the collision occurred, and then insert the new object being allocated into the list existing at that index.


+50 to you sir for dissecting my code and critiquing the design. I understand that I have much to learn about proper and conventional coding, so I'm really glad that you devoted some of your time to help me out. :D

#5 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 08 December 2012 - 10:54 PM

Grrrr. Yet again, I hit "Quote" rather than "Edit". Sorry for the noise.


Which part did you edit? You may have done it while I was writing a response.
*EDIT* Since you helped me out so much, I gave you a +1 on your random post. Posted Image

Edited by Robot Ninja, 08 December 2012 - 10:55 PM.


#6 e‍dd   Members   -  Reputation: 2105

Like
3Likes
Like

Posted 09 December 2012 - 12:13 AM

Keys are only ever numbers and strings, right?

Those are the most common types, but they're not the only ones. Could be anything, really. You might want to use a 2D coordinate as a key, for example.

Could you clarify what you mean by a "traits mechanism"? I'm unfamiliar with the concept. Perhaps a simple example would clear things up. I do indeed need an overloaded operator==(), only to compare the key held inside the data object though.

Sure. You have a template class like this:

template<typename Key>
struct HashTraits; // only declared, no body for the primary template.

It has no definition, but you can specialize it for types that need to work with the hash table:

template<>
struct HashTraits<std::string>
{
    static std::size_t Hash(const std::string &amp;s) { /* ... */ }
    static bool Equal(const std::string &amp;s1, const std::string &amp;s2) { /* ... */ }
};

template<>
struct HashTraits<int>
{
    static std::size_t Hash(int i) { /* ... */ }
    static bool Equal(int i1, int i2) { /* ... */ }
};

// and so on...
EDIT: changed use of uintptr_t to std::size_t.

Then, if the hash table needs to hash an object it does so by calling "HashTraits<Key>::Hash(key)". A similar thing is done for equality comparisons when searching buckets for matching keys. If there's no HashTraits specialization for the Key type available you get a compiler error, indicating that you need to write one. There are some other options in the design space, here. You might have two separate traits templates, one for the hash and one for equality. Or you might forgo the equality part altogether and just rely on the existence of an operator==.

Now, rather than using traits, you could also simply have the requirement that the user must ensure that there's a Hash() function that takes a Key and returns a hash value, rather like you have the operator< requirement for your SortedList. However, some care must be taken as e.g. Hash(const std::string &amp;) might get called even if the Key type is "const char *", due to C++ overload resolution. This may not be the desired behaviour. Using the traits stuff avoids this problem because there has to be a template specialization for the exact type.

So there's a tradeoff between simplicity and correctness.


Do you want to expose the nodes? Or just the values they contain? By returning Node objects, the user can "accidentally" modify any next/prev pointers.

The only reason I can think of to provide my own node type would be for "intrusive links", but your SortedList doesn't seem to allow that (which is fine). Is the Node template parameter necessary?

I can't seem to decide, because I don't have enough experience to know if I would ever need to expose the node or not. When would exposing the node's data object be an issue in game programming? I definitely don't want the 'next' pointer to be handled by anything but the data structure it is serving (i.e. list).

At times, it can be useful to expose the node type (for example, when you've got an intrusive list, as I mentioned).

If you don't have an explicit use-case in mind for the Node, I would hide it for the time being, if possible. As with the protected/private stuff, it's easier to expose stuff at a later date than it is to hide it, given that people may have started to depend on it.


int Size() const;
Is int the best type to use here?

Should I be using something less expensive than that? Short int? Or something with higher resolution?

No, int should be plenty fast. But why not unsigned int? Or std::size_t?

Since hash tables apparently can get pretty crazy in size, I thought int would an okay data type for size, but please explain why you think otherwise. My brain is a sponge right now.

Indeed, any container can get large, depending on what you put in it :) So you want to choose a type that can index the largest possible array you can fit in memory. In other words, you really want an integral type that's the same size as a pointer. On pretty much any modern system in the world, std::size_t will fit the bill.


/** Getter function.
* Returns a ptr to the object stored in the node.
*/
T* Data() const { return &amp;m_data; }
IMHO, this should return a reference. In situations where it's possible for there not to be a value, I can understand the choice to return a pointer. But that's not the case here.

Wouldn't I have to feed the function an external variable then? I can't think of any other way to avoid making the function more complicated. Is the extra parameter just a necessary evil?

I Think we're getting our wires crossed. All I'm really suggesting is that the function be changed to "const T &amp;Data() const { return m_data; }" or "T &amp;Data() { return m_data; }". Actually, now that I look at it again, I'm surprised that the original function even compiles (as there should be a 'const' before the 'T *'). Is this Data() function used anywhere, at present?

Is making ListNode uncopyable just a 'correctness' thing like const, or is there some problem that can arise from it being copyable?

Yes :)
What does it mean to copy a node? If the answer is "that's nonsense", then it would be best to statically disallow it. It's better to be notified of attempted nonsense at compile time than at run time.

by explicit constructors you mean things like the copy and assignment constructors?

No, there's a C++ keyword, "explicit".

At present, I can do weird stuff like this:
void foo(const ChainedHashTable<int, int> &amp;tab); // suppose this function exists

// ...

foo(42); // compiler allows this, as ChainedHashTable's constructor is not explicit!

This is the same mechanism that allows you to pass a "const char *" to a function that takes an std::string. For your constructor, the automatic conversion is probably not wanted!

11 was some arbitrary prime number I chose to start the hash table with.

I see. Presumably the number of buckets is going to change during the life of a hash table, either automatically, or at the user's request(?).


Data* LookUp( Key key ) const;
Should the return type be const-qualified?

I took the const after the function declaration to be a promise to the programmer that the the function will not modify the members of the class...So the fact that I'm returning a non-const pointer to the Data, makes it a non-const function?

Well, let me put is this way. You hand me a hash table, which you say is const. Should I be able to change its content (without resorting to nasty hacks)?

The idiomatic answer for C++ is no, I should not be able to change its content. Sure, the contents of the hash table aren't direct members of the structure, but that sounds like you're just trying to weasel your way out of keeping things const-correct ;)

What often happens is that two overloads are provided for things like this:
const Data* LookUp( Key key ) const;
Data* LookUp( Key key );


private:
std::vector *m_pTable;
Perhaps a typo, but do you really want a pointer to a vector, here?

I guess the general case is to just instantiate the object inside the class.

Yeah, using a pointer is just going to require more book-keeping. Note that if you implement a copy constructor and assignment operator for SortedList, you get them for free if the vector is held by value, rather than by pointer.

EDIT: actually, I now realise a custom assignment operator will still be required if you want to provide the strong exception safety guarantee.


void Resolve( int index ) const;

It's supposed to be my collision resolution function. My plan was to have the function take in the index value where the collision occurred, and then insert the new object being allocated into the list existing at that index.

Up until this point you've got something approaching a very general hash table. So it's a bit weird to have something this specific here, IMHO. Is there any way you can move this specific function outside of the hash table?

I understand that I have much to learn about proper and conventional coding,

That never ends :)

so I'm really glad that you devoted some of your time to help me out. Posted Image

No problem, keep plugging away!

Edited by e‍dd, 09 December 2012 - 08:45 AM.


#7 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 09 December 2012 - 04:17 PM

It has no definition, but you can specialize it for types that need to work with the hash table:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
template<>
struct HashTraits
{
static std::size_t Hash(const std::string &s) { /* ... */ }
static bool Equal(const std::string &s1, const std::string &s2) { /* ... */ }
};

template<>
struct HashTraits
{
static std::size_t Hash(int i) { /* ... */ }
static bool Equal(int i1, int i2) { /* ... */ }
};

// and so on...
EDIT: changed use of uintptr_t to std::size_t.

Then, if the hash table needs to hash an object it does so by calling "HashTraits::Hash(key)". A similar thing is done for equality comparisons when searching buckets for matching keys. If there's no HashTraits specialization for the Key type available you get a compiler error, indicating that you need to write one. There are some other options in the design space, here. You might have two separate traits templates, one for the hash and one for equality. Or you might forgo the equality part altogether and just rely on the existence of an operator==.

Now, rather than using traits, you could also simply have the requirement that the user must ensure that there's a Hash() function that takes a Key and returns a hash value, rather like you have the operator< requirement for your SortedList. However, some care must be taken as e.g. Hash(const std::string &) might get called even if the Key type is "const char *", due to C++ overload resolution. This may not be the desired behaviour. Using the traits stuff avoids this problem because there has to be a template specialization for the exact type.

So there's a tradeoff between simplicity and correctness.

Thanks for the explanation on traits, it seems like a very useful strategy. Also I'm assuming you're saying that the more correct way is to require that there is an existing hash function for their key type? Would using a function pointer parameter with the hash function help solve issues with custom types, or is using traits the more common approach?

Quote
e‍dd, on 08 December 2012 - 07:18 PM, said:
/** Getter function.
* Returns a ptr to the object stored in the node.
*/
T* Data() const { return &m_data; }
IMHO, this should return a reference. In situations where it's possible for there not to be a value, I can understand the choice to return a pointer. But that's not the case here.
Wouldn't I have to feed the function an external variable then? I can't think of any other way to avoid making the function more complicated. Is the extra parameter just a necessary evil?
I Think we're getting our wires crossed. All I'm really suggesting is that the function be changed to "const T &Data() const { return m_data; }" or "T &Data() { return m_data; }". Actually, now that I look at it again, I'm surprised that the original function even compiles (as there should be a 'const' before the 'T *'). Is this Data() function used anywhere, at present?

I see. I didn't realize that doing something like T& Data() { return m_data; } was legal. From what I recalled about returning by reference, I thought that you had to do something like:
T& Data( T &data )
{
data = m_data;
return data;
}
I must have missed something...
Also, I haven't actually started compiling yet (bad idea?). I've just been designing the data structures I need for my assignment, so no, the Dat() function is not being used anywhere yet.

Quote
by explicit constructors you mean things like the copy and assignment constructors?
No, there's a C++ keyword, "explicit".

At present, I can do weird stuff like this:
?
1
2
3
4
5
void foo(const ChainedHashTable &tab); // suppose this function exists

// ...

foo(42); // compiler allows this, as ChainedHashTable's constructor is not explicit!

This is the same mechanism that allows you to pass a "const char *" to a function that takes an std::string. For your constructor, the automatic conversion is probably not wanted!

Posted Image The compiler allows that? So this is basically a concern for any custom type, huh? Does the explicit keyword only apply to the constructor, including the copy constructor and assignment operator?

Presumably the number of buckets is going to change during the life of a hash table, either automatically, or at the user's request(?).

Honestly now that I think about it, resizing the hash table after data has been added is probably a bad idea. It screws up the distribution of the data that the hashing algorithm is trying to avoid. So maybe I should just not include that option.

Well, let me put is this way. You hand me a hash table, which you say is const. Should I be able to change its content (without resorting to nasty hacks)?

The idiomatic answer for C++ is no, I should not be able to change its content. Sure, the contents of the hash table aren't direct members of the structure, but that sounds like you're just trying to weasel your way out of keeping things const-correct ;)

What often happens is that two overloads are provided for things like this:
?
1
2
const Data* LookUp( Key key ) const;
Data* LookUp( Key key );

Okay, so why would anyone include an overloaded function that grants access to the data if there is a similar function that is trying to protect the data? It seems counter-intuitive to me.

Up until this point you've got something approaching a very general hash table. So it's a bit weird to have something this specific here, IMHO. Is there any way you can move this specific function outside of the hash table?

Like I mentioned before in this post, my first inclination is to use function pointers, but your traits concept seems like a good solution as too.

Thanks again for replying to my questions. These changes will be very helpful. Using std::size_t instead of unsigned int or regular int for some reason was mind-blowing. Posted Image The same goes for traits. +50 more to you!

*EDIT* What's with the quotes in my post? Unlike yours, the code boxes never copy over right. I've been using the "selective quote" feature.

Edited by Robot Ninja, 09 December 2012 - 04:19 PM.


#8 rip-off   Moderators   -  Reputation: 8540

Like
4Likes
Like

Posted 09 December 2012 - 05:07 PM

Thanks for the explanation on traits, it seems like a very useful strategy. Also I'm assuming you're saying that the more correct way is to require that there is an existing hash function for their key type? Would using a function pointer parameter with the hash function help solve issues with custom types, or is using traits the more common approach?

Traits would be the idiomatic C++ solution to this. The reason is that function pointers incur additional runtime cost.

I see. I didn't realize that doing something like T& Data() { return m_data; } was legal. From what I recalled about returning by reference, I
thought that you had to do something like:

T& Data( T &data )
{
	 data = m_data;
	 return data;
}
I must have missed something...

You can return a reference, provided the object being referenced outlives the function. For automatically allocated data (local variables), this is not the case. You can however, return references to global data, heap data or references that you have been passed. Member data also falls into this category, it becomes the caller's problem to ensure that the reference remains valid:
struct Example
{
	Example(const std::string name) : m_name(name)
	{
	}
	const std::string &name() const
	{
		return m_name;
	}
private:
	std::string m_name;
};


void bad()
{
	// Uh no, we're taking a reference to the data member..
	const std::string &illegal = Example("Jim").name();
	// ... but the temporary "Example" instance has just expired!

	// And we're dead...
	std::cout << illegal << std::endl;
}

Also, I haven't actually started compiling yet (bad idea?). I've just been designing the data structures I need for my assignment, so no, the Dat() function is not being used anywhere yet.

I'd recommend compiling more often. The linked list code could be designed and tested in isolation, before trying to incorporate it into the more complex hash table. This way, you can be reasonably confident that any bugs in your hash table are in that code, because you've hopefully eliminated the errors in the linked list.

Posted Image The compiler allows that? So this is basically a concern for any custom type, huh?

Yes. As edd indicated, this is what allows you to write:
void foo(const std::string &s) { /* ... */ }
// ...
foo("Hello, world");
Arguably, we should have an "implicit" keyword instead, with the potentially surprising behaviour not being the default. That said, I've never personally bothered with explicit functions myself, but I've not used C++ as a primary language for some time.

Does the explicit keyword only apply to the constructor, including the copy constructor and assignment operator?

You only need to think about it for single argument constructors. And as std::string shows, it can be quite convenient sometimes. Copy constructors are fine - explicit copy constructors don't sound like a good idea to me, I have not come across them before. You cannot have an explicit assignment operator.

Honestly now that I think about it, resizing the hash table after data has been added is probably a bad idea. It screws up the distribution of the data that the hashing algorithm is trying to avoid. So maybe I should just not include that option.

If you do not resize, further insertions are going to cause more and more collisions, which could cause the hash table to degenerate towards it's worst case (a linked list). This is precisely the opposite behaviour of what client code is written to expect - it will be written to expect almost constant time access.

Okay, so why would anyone include an overloaded function that grants access to the data if there is a similar function that is trying to protect the data? It seems counter-intuitive to me.

Is this counter intutive:
void inspect(const std::vector<int> &v) {
	if(!v.empty()) {
		 std::cout << "The first element is: " << v[0] << std::endl;
	}
}

void modify(std::vector<int> &v) {
	if(!v.empty()) {
		v[0] = 42;
	}
}
std::vector::operator[] is overloaded for const and non const access, allowing both client functions to use it naturally.

#9 e‍dd   Members   -  Reputation: 2105

Like
1Likes
Like

Posted 09 December 2012 - 05:42 PM

Thanks for the explanation on traits, it seems like a very useful strategy. Also I'm assuming you're saying that the more correct way is to require that there is an existing hash function for their key type?

Of the stuff I mentioned, I prefer traits to having the user ensure an overloaded function is in scope, as it avoids accidental use of overloaded functions and eliminates the possibility of ODR violations (a somewhat technical subject to which I won't subject you unless provoked :)).

Would using a function pointer parameter with the hash function help solve issues with custom types, or is using traits the more common approach?

As ripoff said, function pointers tend to be slower as in general they aren't very transparent to a compiler.

However, there's another design option, which is to use function objects and add a third template parameter to your hash table. We first provide define a Hash<T> template class with some useful specializations:

template<typename T>
struct Hash;

template<>
struct Hash<std::string>
{
    std::size_t operator() (const std::string &key) const { /* ... */ }
};

template<>
struct Hash<int>
{
    std::size_t operator() (int key) const { /* ... */ }
};

// and so on.

This is rather similar to the traits stuff so far, except that we need to create an instance of Hash<Key> and 'call' it to get a hash value, rather than calling a static function. Such an instance will be kept inside the hash table:

template<typename Key, typename Value, typename HashFunc = Hash<Key> >
class HashTable
{
    public:
	    HashTable(const HashFunc &hash = HashFunc()) : buckets(/* ... */), hash(hash) { }

	    //...

    private:
	    std::vector< ... > buckets;
	    HashFunc hash;
};

Internally, whenever the HashTable needs to get the hash of a particular key, it 'calls' its hash object. This is preferable to the use of function pointers, as a compiler can better optimize calls to Hash<>::operator()s via inlining etc.

Note that we've given the HashFunc template parameter a 'sensible default'. This means the HashTable should work out the box for common key types. However, the user could provide their own type/object for hashing.

This is the kind of approach that's used in the standard library for containers. For example, std::map<K,V> uses std::less<K> to perform comparisons on keys by default, but you can plug in your own comparison function object if required. It's quite complicated (which is why I didn't mention it until now), but it does offer both flexibility and performance.

#10 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 09 December 2012 - 09:25 PM

You can return a reference, provided the object being referenced outlives the function. For automatically allocated data (local variables), this is not the case. You can however, return references to global data, heap data or references that you have been passed. Member data also falls into this category, it becomes the caller's problem to ensure that the reference remains valid:

Just to make sure I understand what you're saying. Could I turn your bad() function to a good() one by doing this:
void good()
{
	 // Creating an Example instance so that the reference remains valid in this scope.
	 Example jim = Example("Jim");
	 const std::string &legal = jim.name();
	 // Works because the "Example" instance still exists here!

	// And we're ALIVE...hooray!
	std::cout << legal << std::endl;
}

If you do not resize, further insertions are going to cause more and more collisions, which could cause the hash table to degenerate towards it's worst case (a linked list). This is precisely the opposite behaviour of what client code is written to expect - it will be written to expect almost constant time access.

What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize? Rehashing the data while it was being copied over to a larger table COULD be a solution, but sounds like a painfully expensive operation. If there is a decent solution, is it best to leave the user/programmer using the class to manually resize the table, and/or implement an auto-resize mechanism like in std::vector? Lastly, game engines don't tend to use chaining do they, since it doesn't seem cache-friendly? 1-2-3-GO! Posted Image

Is this counter intutive:

void inspect(const std::vector &v) {
	if(!v.empty()) {
		 std::cout << "The first element is: " << v[0] << std::endl;
	}
}

void modify(std::vector &v) {
	if(!v.empty()) {
		v[0] = 42;
	}
}
std::vector::operator[] is overloaded for const and non const access, allowing both client functions to use it naturally.

I think I see what you're saying. At first it still seems like one non-const solution would work for both situations, but I'm assuming this is again a "correctness"/proper style thing. So the compiler will always for for the const function, unless the user wishes to change a value.

However, there's another design option, which is to use function objects and add a third template parameter to your hash table. We first provide define a Hash template class with some useful specializations:

template
struct Hash;

template<>
struct Hash
{
	std::size_t operator() (const std::string &key) const { /* ... */ }
};

template<>
struct Hash
{
	std::size_t operator() (int key) const { /* ... */ }
};

// and so on.

This is rather similar to the traits stuff so far, except that we need to create an instance of Hash and 'call' it to get a hash value, rather than calling a static function. Such an instance will be kept inside the hash table:

template >
class HashTable
{
	public:
		HashTable(const HashFunc &hash = HashFunc()) : buckets(/* ... */), hash(hash) { }

		//...

	private:
		std::vector< ... > buckets;
		HashFunc hash;
};

Internally, whenever the HashTable needs to get the hash of a particular key, it 'calls' its hash object. This is preferable to the use of function pointers, as a compiler can better optimize calls to Hash<>::operator()s via inlining etc.

Note that we've given the HashFunc template parameter a 'sensible default'. This means the HashTable should work out the box for common key types. However, the user could provide their own type/object for hashing.

This is the kind of approach that's used in the standard library for containers. For example, std::map uses std::less to perform comparisons on keys by default, but you can plug in your own comparison function object if required. It's quite complicated (which is why I didn't mention it until now), but it does offer both flexibility and performance.

Very interesting. I might just have to implement this for my hash table then. Thanks! Geez...where can I learn more of this stuff!? Posted Image

Thank you both for extensively helping me with this, and allowing me to pick at your brains. It's been a great learning experience. I will go ahead and add traits for my hash table, and the additional template parameter that edd mentioned. Posted Image

*EDIT* Sorry, I referred to edd's last suggestion as traits, but with a second look I'm no longer sure. Is there a particular name for this kind of design?

Edited by Robot Ninja, 10 December 2012 - 01:49 AM.


#11 e‍dd   Members   -  Reputation: 2105

Like
1Likes
Like

Posted 11 December 2012 - 08:26 PM

Just to make sure I understand what you're saying. Could I turn your bad() function to a good() one by doing this:

void good()
{
	 // Creating an Example instance so that the reference remains valid in this scope.
	 Example jim = Example("Jim");
	 const std::string &legal = jim.name();
	 // Works because the "Example" instance still exists here!

	// And we're ALIVE...hooray!
	std::cout << legal << std::endl;
}

Yes, though 'Example jim("Jim");' would be a more idiomatic way of writing the first line.

What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize?

What makes you think it will be lop-sided? If it is, there's either a problem with your algorithm, or your hash function is poor.

Rehashing the data while it was being copied over to a larger table COULD be a solution, but sounds like a painfully expensive operation.

It is the solution. And it is expensive. But it reduces the cost of each lookup. It's the difference between O(1) and O(n).

If there is a decent solution, is it best to leave the user/programmer using the class to manually resize the table, and/or implement an auto-resize mechanism like in std::vector?

Up to you, but I tend to allow the specification of a load factor on construction, after which rehashing occurs. The default value might be 3, or something.

Lastly, game engines don't tend to use chaining do they, since it doesn't seem cache-friendly? 1-2-3-GO! Posted Image

Depends on how god your allocator is and how you make your chains. For example, in the past I've made a hash table where the buckets were linked lists, but each 'next' pointer was actually an index. In total I required two std::vector<>s as data members. One vector contained "kv_node's:

struct kv_node
{
    static const std::size_t no_next_node = std::size_t(-1);

    kv_node(const Key key, const Value value, std::size_t next) : key(key), value(value), next(next) { }

    Key key;
    Value value;
    std::size_t next;
};

And the other vector just had std::size_t's containing the index in to the first for the first node in each bucket.

I'm trying to remember how I did removal efficiently. I suspect I did something like swap the element being removed with the last element in the main vector, which would have required performing another hash. Or I might have kept a separate std::size_t in the data structure alongside the vectors indicating the head of a free-list.

Very interesting. I might just have to implement this for my hash table then. Thanks! Geez...where can I learn more of this stuff!? Posted Image


http://www.amazon.co.uk/Accelerated-Practical-Programming-Example-Series/dp/020170353X/ref=sr_1_1?ie=UTF8&qid=1355278925&sr=8-1
http://www.amazon.co.uk/Standard-Library-Tutorial-Reference/dp/0321623215/ref=sr_1_1?s=books&ie=UTF8&qid=1355279152&sr=1-1
http://www.amazon.co.uk/C-Programming-Language-Special/dp/0201700735/ref=sr_1_2?s=books&ie=UTF8&qid=1355278943&sr=1-2
http://www.amazon.co.uk/Exceptional-C-Herb-Sutter/dp/0201615622/ref=sr_1_1?s=books&ie=UTF8&qid=1355278967&sr=1-1
http://www.amazon.co.uk/More-Exceptional-Engineering-Programming-Solutions/dp/020170434X/ref=sr_1_3?s=books&ie=UTF8&qid=1355278967&sr=1-3
http://www.amazon.co.uk/Exceptional-Style-Engineering-Programming-Solutions/dp/0201760428/ref=sr_1_4?s=books&ie=UTF8&qid=1355278967&sr=1-4
http://www.amazon.co.uk/Effective-Specific-Programs-Professional-Computing/dp/0321334876/ref=pd_bxgy_b_img_y
http://www.amazon.co.uk/More-Effective-Programs-Professional-Computing/dp/020163371X/ref=pd_bxgy_b_img_y
http://www.amazon.co.uk/Effective-STL-Specific-Professional-Computing/dp/0201749629/ref=pd_bxgy_b_img_z
http://www.amazon.co.uk/Modern-Design-Applied-Generic-Patterns/dp/0201704315/ref=pd_sim_b_6

And lots of practice :)

#12 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 12 December 2012 - 04:06 PM


What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize?

What makes you think it will be lop-sided? If it is, there's either a problem with your algorithm, or your hash function is poor.

It would depend on what the user would want to load factor to be before resize I suppose. I was saying that the distribution would be lopsided if the load factor was already high (e.g. .8) before a resize, once you do resize (assuming you do not rehash) then the distribution of data in the table would be lopsided. However since the solution IS to rehash, then that situation is no longer relevant.

Up to you, but I tend to allow the specification of a load factor on construction, after which rehashing occurs. The default value might be 3, or something.

Sounds good.

Depends on how god your allocator is and how you make your chains. For example, in the past I've made a hash table where the buckets were linked lists, but each 'next' pointer was actually an index. In total I required two std::vector<>s as data members. One vector contained "kv_node's:

struct kv_node
{
	static const std::size_t no_next_node = std::size_t(-1);

	kv_node(const Key key, const Value value, std::size_t next) : key(key), value(value), next(next) { }

	Key key;
	Value value;
	std::size_t next;
};

And the other vector just had std::size_t's containing the index in to the first for the first node in each bucket.

I'm trying to remember how I did removal efficiently. I suspect I did something like swap the element being removed with the last element in the main vector, which would have required performing another hash. Or I might have kept a separate std::size_t in the data structure alongside the vectors indicating the head of a free-list.

So this hash table acts like one with chaining in the way it looks up data and inserts data (partially), but instead of in a traditional linked list, the data is actually in one big bucket? So I'm curious - what's the benefit of this design over a regular, non-chained hash table then? I'm sorry if I'm asking too many questions, you might be tired of answering all of them. I won't hold it against you if you want to move on, lol.

http://www.amazon.co...55278925&sr=8-1
http://www.amazon.co...55279152&sr=1-1
http://www.amazon.co...55278943&sr=1-2
http://www.amazon.co...55278967&sr=1-1
http://www.amazon.co...55278967&sr=1-3
http://www.amazon.co...55278967&sr=1-4
http://www.amazon.co...pd_bxgy_b_img_y
http://www.amazon.co...pd_bxgy_b_img_y
http://www.amazon.co...pd_bxgy_b_img_z
http://www.amazon.co.../ref=pd_sim_b_6

And lots of practice

Wow, excellent. These sound like the perfect kind of books. Thanks! Posted Image

#13 e‍dd   Members   -  Reputation: 2105

Like
1Likes
Like

Posted 13 December 2012 - 08:07 PM

So this hash table acts like one with chaining in the way it looks up data and inserts data (partially), but instead of in a traditional linked list, the data is actually in one big bucket?

That's pretty much it, yeah.

So I'm curious - what's the benefit of this design over a regular, non-chained hash table then?

Better locality, fewer allocations and simpler/faster iteration over all elements (the iterators are just wrappers around vector iterators). The memory footprint's a bit higher, though.

#14 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 13 December 2012 - 08:13 PM

One last thing I wanted to mention. The reason why my professor recommended using sorted linked lists over unsorted lists is because in sorted linked lists, look-ups for values that don't exist can be halted early when we have encountered a "greater" value. It still seems like it wouldn't make much of a difference though since we would want to resize the hash table before it got too big anyways...

#15 e‍dd   Members   -  Reputation: 2105

Like
0Likes
Like

Posted 13 December 2012 - 08:40 PM

One last thing I wanted to mention. The reason why my professor recommended using sorted linked lists over unsorted lists is because in sorted linked lists, look-ups for values that don't exist can be halted early when we have encountered a "greater" value.

Makes sense, but when you insert, you have to iterate over half of the elements in a bucket (on average) too.

It still seems like it wouldn't make much of a difference though since we would want to resize the hash table before it got too big anyways...

Agreed :)

#16 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 20 December 2012 - 08:13 PM

I had to revive this thread because I have some further questions about designing a (chained) hash table. In my program I will be keeping track of integer values inside the hash table, the keys being std::string objects. Since I have to do look ups using a key, what should I be inserting into the table? My first thought was to give my insert function a Key parameter, and a Data parameter, but then it seemed like I would have to keep track of the key and data separately because my linked list class used for chaining only keeps track of one object. So naturally my second thought was to give the table a struct to work with. This struct simply contained the key (std::string) used for hashing, and the data (int) that would be stored in the table. So that makes the user responsible for packaging their key and data before calling the table's Insert function. Is this how it is usually done?

*EDIT* ...or maybe I should redesign my list class to sort by an external key, rather than the actual stored in the list? I could create another node struct that stored a key and the data.

Edited by Robot Ninja, 20 December 2012 - 08:21 PM.


#17 e‍dd   Members   -  Reputation: 2105

Like
0Likes
Like

Posted 21 December 2012 - 01:34 PM

So naturally my second thought was to give the table a struct to work with. This struct simply contained the key (std::string) used for hashing, and the data (int) that would be stored in the table.

 

This is the way to go.

 

So that makes the user responsible for packaging their key and data before calling the table's Insert function. Is this how it is usually done?

You can do the packaging in the insertion method, on their behalf.



#18 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 21 December 2012 - 02:30 PM

*EDIT* The website upgrade must have made this reply's format go crazy. So I reposted below, and it looks about right. smile.png

Edited by Robot Ninja, 21 December 2012 - 02:40 PM.


#19 Robot Ninja   Members   -  Reputation: 569

Like
0Likes
Like

Posted 21 December 2012 - 02:38 PM


So that makes the user responsible for packaging their key and data before calling the table's Insert function. Is this how it is usually done?

 
You can do the packaging in the insertion method, on their behalf.
 


Unless there's something I'm not seeing, I couldn't figure out how to do that without dynamic allocation of the object. Is that the only way? I don't know if I really want my sorted list to hold the addresses rather than the actual objects themselves. :/

Just for reference, here are the data members I have in my hash table class:
std::size_t m_tableSize;
std::size_t m_loadThreshold;			// Tracks number of items in table.
float m_loadFactor;				// Holds threshold value for auto table resize.
SortedList<HashContainer<Key, Data>> *m_pTable;	// Table of sorted buckets/chains.

Here is my constructor definition:
template <typename Key, typename Data, typename HashFunc>
ChainedHashTable<Key, Data, HashFunc>::ChainedHashTable( std::size_t size /*= 37 */, 
							float loadFactor  /*= 1 */,
							const HashFunc &hashFunc = HashFunc() )
	: m_tableSize(size), m_loadFactor(loadFactor), hash(hashFunc), 
	m_pTable(new SortedList<HashContainer<Key, Data>>[size])
{
	
}

Here is my hash container:
template <typename Key, typename Data>
struct HashContainer
{
public:
	/** Constructor.
	 *	@param key Key object used by hashing function.
	 *	@param data Data object stored in the hash table.
	 */
	HashContainer( const Key &key, const Data &data )
		: m_key(key), m_data(data) {}

	/** Destructor.
	 */
	~HashContainer();

	/** Getter func.
	 *	Returns a const reference to the key.
	 */
	const Key& GetKey() { return m_key; }

	/** Getter func.
	 *	Returns a reference (const/non-const) to the data object stored inside.
	 */
	const Data& GetData() const { return m_data; }
	Data& GetData() { return m_data; }

	/** Operator ==
	 *	Returns true if this key is equal to the other key.
	 */
	bool operator==( const HashContainer &hc ) { return this->GetKey() == hc.GetKey(); }
	bool operator==( const Key &key ) { return this->GetKey() == key; }

	/** Operator !=
	 *	Returns true if this key is not equal to the other key.
	 */
	bool operator!=( const HashContainer &hc ) { return this->GetKey() != hc.GetKey(); }
	bool operator!=( const Key &key ) { return this->GetKey() != key; }

	/** Operator <
	 *	Returns true if this key is less than the other key.
	 */
	bool operator<( const HashContainer &hc ) { return this->GetKey() < hc.GetKey(); }
	bool operator<( const Key &key ) { return this->GetKey() > key; }

	/** Operator >
	 *	Returns true if this key is greater than the other key.
	 */
	bool operator>( const HashContainer &hc ) { return this->GetKey() > hc.GetKey(); }
	bool operator>( const Key &key ) { return this->GetKey() > key; }

private:
	Key m_key;
	Data m_data;
};


#20 e‍dd   Members   -  Reputation: 2105

Like
1Likes
Like

Posted 21 December 2012 - 02:52 PM

You don't need any dynamic allocation (beyond what's required for creating a ListNode for the SortedList). Could you explain why you think you need to do any more than that?






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS