• Create Account

#ActualRobot Ninja

Posted 10 December 2012 - 01:49 AM

You can return a reference, provided the object being referenced outlives the function. For automatically allocated data (local variables), this is not the case. You can however, return references to global data, heap data or references that you have been passed. Member data also falls into this category, it becomes the caller's problem to ensure that the reference remains valid:

Just to make sure I understand what you're saying. Could I turn your bad() function to a good() one by doing this:
void good()
{
// Creating an Example instance so that the reference remains valid in this scope.
Example jim = Example("Jim");
const std::string &legal = jim.name();
// Works because the "Example" instance still exists here!

// And we're ALIVE...hooray!
std::cout << legal << std::endl;
}


If you do not resize, further insertions are going to cause more and more collisions, which could cause the hash table to degenerate towards it's worst case (a linked list). This is precisely the opposite behaviour of what client code is written to expect - it will be written to expect almost constant time access.

What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize? Rehashing the data while it was being copied over to a larger table COULD be a solution, but sounds like a painfully expensive operation. If there is a decent solution, is it best to leave the user/programmer using the class to manually resize the table, and/or implement an auto-resize mechanism like in std::vector? Lastly, game engines don't tend to use chaining do they, since it doesn't seem cache-friendly? 1-2-3-GO!

Is this counter intutive:

void inspect(const std::vector &v) {
if(!v.empty()) {
std::cout << "The first element is: " << v[0] << std::endl;
}
}

void modify(std::vector &v) {
if(!v.empty()) {
v[0] = 42;
}
}

std::vector::operator[] is overloaded for const and non const access, allowing both client functions to use it naturally.

I think I see what you're saying. At first it still seems like one non-const solution would work for both situations, but I'm assuming this is again a "correctness"/proper style thing. So the compiler will always for for the const function, unless the user wishes to change a value.

However, there's another design option, which is to use function objects and add a third template parameter to your hash table. We first provide define a Hash template class with some useful specializations:

template
struct Hash;

template<>
struct Hash
{
std::size_t operator() (const std::string &key) const { /* ... */ }
};

template<>
struct Hash
{
std::size_t operator() (int key) const { /* ... */ }
};

// and so on.


This is rather similar to the traits stuff so far, except that we need to create an instance of Hash and 'call' it to get a hash value, rather than calling a static function. Such an instance will be kept inside the hash table:

template >
class HashTable
{
public:
HashTable(const HashFunc &hash = HashFunc()) : buckets(/* ... */), hash(hash) { }

//...

private:
std::vector< ... > buckets;
HashFunc hash;
};


Internally, whenever the HashTable needs to get the hash of a particular key, it 'calls' its hash object. This is preferable to the use of function pointers, as a compiler can better optimize calls to Hash<>::operator()s via inlining etc.

Note that we've given the HashFunc template parameter a 'sensible default'. This means the HashTable should work out the box for common key types. However, the user could provide their own type/object for hashing.

This is the kind of approach that's used in the standard library for containers. For example, std::map uses std::less to perform comparisons on keys by default, but you can plug in your own comparison function object if required. It's quite complicated (which is why I didn't mention it until now), but it does offer both flexibility and performance.

Very interesting. I might just have to implement this for my hash table then. Thanks! Geez...where can I learn more of this stuff!?

Thank you both for extensively helping me with this, and allowing me to pick at your brains. It's been a great learning experience. I will go ahead and add traits for my hash table, and the additional template parameter that edd mentioned.

*EDIT* Sorry, I referred to edd's last suggestion as traits, but with a second look I'm no longer sure. Is there a particular name for this kind of design?

#2Robot Ninja

Posted 09 December 2012 - 09:27 PM

You can return a reference, provided the object being referenced outlives the function. For automatically allocated data (local variables), this is not the case. You can however, return references to global data, heap data or references that you have been passed. Member data also falls into this category, it becomes the caller's problem to ensure that the reference remains valid:

Just to make sure I understand what you're saying. Could I turn your bad() function to a good() one by doing this:
void good()
{
// Creating an Example instance so that the reference remains valid in this scope.
Example jim = Example("Jim");
const std::string &legal = jim.name();
// Works because the "Example" instance still exists here!

// And we're ALIVE...hooray!
std::cout << legal << std::endl;
}


If you do not resize, further insertions are going to cause more and more collisions, which could cause the hash table to degenerate towards it's worst case (a linked list). This is precisely the opposite behaviour of what client code is written to expect - it will be written to expect almost constant time access.

What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize? Rehashing the data while it was being copied over to a larger table COULD be a solution, but sounds like a painfully expensive operation. If there is a decent solution, is it best to leave the user/programmer using the class to manually resize the table, and/or implement an auto-resize mechanism like in std::vector? Lastly, game engines don't tend to use chaining do they, since it doesn't seem cache-friendly? 1-2-3-GO!

Is this counter intutive:

void inspect(const std::vector &v) {
if(!v.empty()) {
std::cout << "The first element is: " << v[0] << std::endl;
}
}

void modify(std::vector &v) {
if(!v.empty()) {
v[0] = 42;
}
}

std::vector::operator[] is overloaded for const and non const access, allowing both client functions to use it naturally.

I think I see what you're saying. At first it still seems like one non-const solution would work for both situations, but I'm assuming this is again a "correctness"/proper style thing. So the compiler will always for for the const function, unless the user wishes to change a value.

However, there's another design option, which is to use function objects and add a third template parameter to your hash table. We first provide define a Hash template class with some useful specializations:

template
struct Hash;

template<>
struct Hash
{
std::size_t operator() (const std::string &key) const { /* ... */ }
};

template<>
struct Hash
{
std::size_t operator() (int key) const { /* ... */ }
};

// and so on.


This is rather similar to the traits stuff so far, except that we need to create an instance of Hash and 'call' it to get a hash value, rather than calling a static function. Such an instance will be kept inside the hash table:

template >
class HashTable
{
public:
HashTable(const HashFunc &hash = HashFunc()) : buckets(/* ... */), hash(hash) { }

//...

private:
std::vector< ... > buckets;
HashFunc hash;
};


Internally, whenever the HashTable needs to get the hash of a particular key, it 'calls' its hash object. This is preferable to the use of function pointers, as a compiler can better optimize calls to Hash<>::operator()s via inlining etc.

Note that we've given the HashFunc template parameter a 'sensible default'. This means the HashTable should work out the box for common key types. However, the user could provide their own type/object for hashing.

This is the kind of approach that's used in the standard library for containers. For example, std::map uses std::less to perform comparisons on keys by default, but you can plug in your own comparison function object if required. It's quite complicated (which is why I didn't mention it until now), but it does offer both flexibility and performance.

Very interesting. I might just have to implement this for my hash table then. Thanks! Geez...where can I learn more of this stuff!?

Thank you both for extensively helping me with this, and allowing me to pick at your brains. It's been a great learning experience. I will go ahead and add traits for my hash table, and the additional template parameter that edd mentioned.

#1Robot Ninja

Posted 09 December 2012 - 09:25 PM

You can return a reference, provided the object being referenced outlives the function. For automatically allocated data (local variables), this is not the case. You can however, return references to global data, heap data or references that you have been passed. Member data also falls into this category, it becomes the caller's problem to ensure that the reference remains valid:

Just to make sure I understand what you're saying. Could I turn your bad() function to a good() one by doing this:
void bad()
{
// Creating an Example instance so that the reference remains valid in this scope.
Example jim = Example("Jim");
const std::string &legal = jim.name();
// Works because the "Example" instance still exists here!

// And we're ALIVE...hooray!
std::cout << legal << std::endl;
}


If you do not resize, further insertions are going to cause more and more collisions, which could cause the hash table to degenerate towards it's worst case (a linked list). This is precisely the opposite behaviour of what client code is written to expect - it will be written to expect almost constant time access.

What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize? Rehashing the data while it was being copied over to a larger table COULD be a solution, but sounds like a painfully expensive operation. If there is a decent solution, is it best to leave the user/programmer using the class to manually resize the table, and/or implement an auto-resize mechanism like in std::vector? Lastly, game engines don't tend to use chaining do they, since it doesn't seem cache-friendly? 1-2-3-GO!

Is this counter intutive:

void inspect(const std::vector &v) {
if(!v.empty()) {
std::cout << "The first element is: " << v[0] << std::endl;
}
}

void modify(std::vector &v) {
if(!v.empty()) {
v[0] = 42;
}
}

std::vector::operator[] is overloaded for const and non const access, allowing both client functions to use it naturally.

I think I see what you're saying. At first it still seems like one non-const solution would work for both situations, but I'm assuming this is again a "correctness"/proper style thing. So the compiler will always for for the const function, unless the user wishes to change a value.

However, there's another design option, which is to use function objects and add a third template parameter to your hash table. We first provide define a Hash template class with some useful specializations:

template
struct Hash;

template<>
struct Hash
{
std::size_t operator() (const std::string &key) const { /* ... */ }
};

template<>
struct Hash
{
std::size_t operator() (int key) const { /* ... */ }
};

// and so on.


This is rather similar to the traits stuff so far, except that we need to create an instance of Hash and 'call' it to get a hash value, rather than calling a static function. Such an instance will be kept inside the hash table:

template >
class HashTable
{
public:
HashTable(const HashFunc &hash = HashFunc()) : buckets(/* ... */), hash(hash) { }

//...

private:
std::vector< ... > buckets;
HashFunc hash;
};


Internally, whenever the HashTable needs to get the hash of a particular key, it 'calls' its hash object. This is preferable to the use of function pointers, as a compiler can better optimize calls to Hash<>::operator()s via inlining etc.

Note that we've given the HashFunc template parameter a 'sensible default'. This means the HashTable should work out the box for common key types. However, the user could provide their own type/object for hashing.

This is the kind of approach that's used in the standard library for containers. For example, std::map uses std::less to perform comparisons on keys by default, but you can plug in your own comparison function object if required. It's quite complicated (which is why I didn't mention it until now), but it does offer both flexibility and performance.

Very interesting. I might just have to implement this for my hash table then. Thanks! Geez...where can I learn more of this stuff!?

Thank you both for extensively helping me with this, and allowing me to pick at your brains. It's been a great learning experience. I will go ahead and add traits for my hash table, and the additional template parameter that edd mentioned.

PARTNERS