Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 17 May 2007
Offline Last Active Nov 04 2013 05:53 PM

#5035725 Flat hash containers

Posted by e‍dd on 23 February 2013 - 05:48 AM

In response to your first question, about submission to boost:


The work you will have to do to get your library to the point of submission (assuming there's sufficient interest -- ask on the boost mailing list) includes:

  • provide documentation in a manner consistent with boost's style and infrastructure
  • ditto for tests
  • ditto for build scripts
  • modifying your code to use boost's portability macros/conventions
  • getting it to build and perform well on platforms that you may have little or no experience with. I'm almost certain that you will be required to provide a version that builds cleanly on POSIX systems such as Linux and OS X.

Libraries undergoing submission are assigned a review manager. At some point, your library will be reviewed publicly on the boost mailing list. It can take quite a long time, depending on interest, to get your library through the review backlog.


Once the review starts, a lot of people will tear your code apart, often with little or no tact. So make sure you're comfortable with criticism. Revisions will probably have to be made to what you have now until people are satisfied. I haven't looked at your code, it's just what usually happens.


Assuming submission is successful, you will have to maintain your library for the foreseeable future. This will probably mean:

  • familiarizing yourself with boost's release process
  • responding to bug reports, even those where you don't have access to a similar system to that of the reporter. Of course, this is true wherever you 'publish' your code, but the exposure provided by boost will mean more bugs reported (even if they're invalid).
  • there's talk on the mailing list of boost moving to a new version control/modularization/distribution system soon, too. So you will almost certainly have more maintenance work to do in future.
  • More generally, you will have to stay up to date with boost's infrastructure changes, just to keep your library's head above water. 


Exposure is nice of course, but submission to boost isn't a one-shot thing.

#5028318 Turn a non reentrant C library into a library supporting several instances of...

Posted by e‍dd on 03 February 2013 - 06:54 AM

The best way is to fix the library so that each function takes some kind of 'context' structure, in to which all the global state is moved. Multiple context structures would allow multiple clients in the same process.


Loading a library twice (at distinct locations in virtual memory) may be possible, but it will be highly platform dependent. What platform(s) do you care about?


It's possible to write a non-intrusive wrapper, as long as you have a true static library (not a stub) and are willing to find all global state. In this case you will have to:

  1. Create a context structure with one member for each piece of global state. 
  2. Create init_context(context *ctx) that initializes a context in the same way that the global state is initialized.
  3. Create get_current_context(context *ctx) and set_current_context(const context *ctx) functions that get and set the current global state
  4. Add a global mutex if you want to use this in a multithreaded environment.
  5. Now wrap each function, f, as follows:

return_type wrapped_f(context *ctx, type1 arg1, type2 arg2, type3 arg3, etc)
    context old_ctx;
    return_type ret = f(arg1, arg2, arg3, etc);
    return ret;


An RAII-style class could automate the locking and context swapping. I admit it's ugly and you'll still have to take care to look for new global state to add to the context structure whenever the underlying library is changed, but it's probably all you can do without modifying the library.

#5023359 Best Method For Writing Unit Tests

Posted by e‍dd on 19 January 2013 - 07:30 PM

IMHO, tests should really only use the public members of a class. You shouldn't need to add any kind of "hooks" for testing.

If you feel the need to add extra functionality purely for testing, it could be a hint that you've violated what's often called the Single Responsibility Principle.


I'm not an advocate of TDD, but when using this methodology, private methods arise naturally as part of the 'refactoring' steps and so will be automatically exercised by the tests.


As for how to structure unit tests, look at the examples of existing libraries/frameworks. googletest and unittest++ are two popular ones, though there are countless others.

#5017666 Accessing a Devices Interface/Class GUID always causes an error

Posted by e‍dd on 05 January 2013 - 12:17 AM

Have you checked wparam? If it's something like DBT_DEVICEREMOVECOMPLETE, you're probably casting lParam to the wrong type in the first place.


In general, it's my understanding that you should always first cast lparam to a DEV_BROADCAST_HDR* and only cast to something further once you have confirmed the value of its dbch_devicetype.

#5016146 fread question

Posted by e‍dd on 31 December 2012 - 01:21 PM

Suppose I have a file like this:
Are you saying it contains the bytes 0x30, 0x0D, 0x0A, 0x31, 0x0D, 0x0A, 0x32, 0x0D, 0x0A, 0x33 (assuming little endian and typical windows text formatting)?
Or are you saying it contains something like 0x01, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00?

The distinction is important. But I'll assume the latter as that's what the rest of your question implies.

and I can use fread [...3 times...]

exactly like that,will mFile contain:
Reading a file does not change its content. Think of a FILE as containing a pointer or cursor in it somewhere that is moved along every time you read something from the file.

The first fread() call will read an int (on success) and advance the cursor sizeof(int) bytes, so that the cursor is now at 0x02,00,00,00 (again assuming 'binary' formatting).

The fseek() function allows you to change the position of this cursor. ftell() tells you its position.

Not sure if this answers your question directly, but I hope it helps.

EDIT: ninja'd :(

#5014203 C++ encapsulation

Posted by e‍dd on 25 December 2012 - 01:00 PM

It seems to me that when a class needs to contain data types that would preferably be forward declared in the header file, it would be most beneficial to just forward declare one structure to hold all of its member variables.




Does this seem like sound design? I'm curious to know why it is not a more common practice.

It's actually quite common. It even has a bunch of names: the "pimpl idiom", the "compiler firewall idiom", the "cheshire cat idiom" and the "bridge pattern".

#5013334 [C++] Problems designing a proper hash table w/ chaining

Posted by e‍dd on 21 December 2012 - 10:35 PM

error LNK2019: unresolved external symbol "public: __thiscall HashContainer<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int>::~HashContainer<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int>(void)"

The trick with error messages like this is to resist intimidation and the urge to give up. If we replace "std::basic_string<char, struct std::char_traits<char>, class std::allocator<char> >" with "std::string", this message should become a bit clearer:


error LNK2019: unresolved external symbol "public: __thiscall HashContainer<std::string,int>::~HashContainer<std::string,int>(void)"


In other words, the linker can't find a definition for your HashContainer's destructor. Have you provided one?

#5013266 [C++] Problems designing a proper hash table w/ chaining

Posted by e‍dd on 21 December 2012 - 04:44 PM

Wait, so would something like this work:

template <typename Key, typename Data, typename HashFunc>
void ChainedHashTable<Key, Data, HashFunc>::Insert( const Key &key, const Data &data )
	/* ...Table resize code can go here... */

	// Hash an appropriate index.
	std::size_t index = Normalize( hash(key) );

	// Insert the key and data into the hash table.
	// SortedList object will deal with sorting and inserting into the chain.
	m_pTable[index].Insert(HashContainer(key, data));

Yes :)

#5013240 [C++] Problems designing a proper hash table w/ chaining

Posted by e‍dd on 21 December 2012 - 02:52 PM

You don't need any dynamic allocation (beyond what's required for creating a ListNode for the SortedList). Could you explain why you think you need to do any more than that?

#5013098 Verlet physics

Posted by e‍dd on 21 December 2012 - 06:17 AM

Once you have calculated the force due to friction, you divide by the mass to get the acceleration (a deceleration in this case) due to friction. This is then added to the acceleration before you perform an iteration of the verlet solver (or any other solver for the equations of motion). So your Physics::UpdateVerlet() method should not have to change. How you actually model friction is up to you. One simple model is to have the magnitude of friction be proportional to that of the perpendicular component of the contact force of the two bodies (so proportional to the weight -- the effect of mass due to gravity -- of a body if it is sitting on a flat surface). Any introductory book on Newtonian mechanics is likely to explain this model.

#5010434 [C++] Problems designing a proper hash table w/ chaining

Posted by e‍dd on 13 December 2012 - 08:07 PM

So this hash table acts like one with chaining in the way it looks up data and inserts data (partially), but instead of in a traditional linked list, the data is actually in one big bucket?

That's pretty much it, yeah.

So I'm curious - what's the benefit of this design over a regular, non-chained hash table then?

Better locality, fewer allocations and simpler/faster iteration over all elements (the iterators are just wrappers around vector iterators). The memory footprint's a bit higher, though.

#5009655 [C++] Problems designing a proper hash table w/ chaining

Posted by e‍dd on 11 December 2012 - 08:26 PM

Just to make sure I understand what you're saying. Could I turn your bad() function to a good() one by doing this:

void good()
	 // Creating an Example instance so that the reference remains valid in this scope.
	 Example jim = Example("Jim");
	 const std::string &legal = jim.name();
	 // Works because the "Example" instance still exists here!

	// And we're ALIVE...hooray!
	std::cout << legal << std::endl;

Yes, though 'Example jim("Jim");' would be a more idiomatic way of writing the first line.

What then would be the best way to deal with a lopsided distribution of data in the hash table, due to the resize?

What makes you think it will be lop-sided? If it is, there's either a problem with your algorithm, or your hash function is poor.

Rehashing the data while it was being copied over to a larger table COULD be a solution, but sounds like a painfully expensive operation.

It is the solution. And it is expensive. But it reduces the cost of each lookup. It's the difference between O(1) and O(n).

If there is a decent solution, is it best to leave the user/programmer using the class to manually resize the table, and/or implement an auto-resize mechanism like in std::vector?

Up to you, but I tend to allow the specification of a load factor on construction, after which rehashing occurs. The default value might be 3, or something.

Lastly, game engines don't tend to use chaining do they, since it doesn't seem cache-friendly? 1-2-3-GO! Posted Image

Depends on how god your allocator is and how you make your chains. For example, in the past I've made a hash table where the buckets were linked lists, but each 'next' pointer was actually an index. In total I required two std::vector<>s as data members. One vector contained "kv_node's:

struct kv_node
    static const std::size_t no_next_node = std::size_t(-1);

    kv_node(const Key key, const Value value, std::size_t next) : key(key), value(value), next(next) { }

    Key key;
    Value value;
    std::size_t next;

And the other vector just had std::size_t's containing the index in to the first for the first node in each bucket.

I'm trying to remember how I did removal efficiently. I suspect I did something like swap the element being removed with the last element in the main vector, which would have required performing another hash. Or I might have kept a separate std::size_t in the data structure alongside the vectors indicating the head of a free-list.

Very interesting. I might just have to implement this for my hash table then. Thanks! Geez...where can I learn more of this stuff!? Posted Image


And lots of practice :)

#5008918 [C++] Problems designing a proper hash table w/ chaining

Posted by e‍dd on 09 December 2012 - 05:42 PM

Thanks for the explanation on traits, it seems like a very useful strategy. Also I'm assuming you're saying that the more correct way is to require that there is an existing hash function for their key type?

Of the stuff I mentioned, I prefer traits to having the user ensure an overloaded function is in scope, as it avoids accidental use of overloaded functions and eliminates the possibility of ODR violations (a somewhat technical subject to which I won't subject you unless provoked :)).

Would using a function pointer parameter with the hash function help solve issues with custom types, or is using traits the more common approach?

As ripoff said, function pointers tend to be slower as in general they aren't very transparent to a compiler.

However, there's another design option, which is to use function objects and add a third template parameter to your hash table. We first provide define a Hash<T> template class with some useful specializations:

template<typename T>
struct Hash;

struct Hash<std::string>
    std::size_t operator() (const std::string &key) const { /* ... */ }

struct Hash<int>
    std::size_t operator() (int key) const { /* ... */ }

// and so on.

This is rather similar to the traits stuff so far, except that we need to create an instance of Hash<Key> and 'call' it to get a hash value, rather than calling a static function. Such an instance will be kept inside the hash table:

template<typename Key, typename Value, typename HashFunc = Hash<Key> >
class HashTable
	    HashTable(const HashFunc &hash = HashFunc()) : buckets(/* ... */), hash(hash) { }


	    std::vector< ... > buckets;
	    HashFunc hash;

Internally, whenever the HashTable needs to get the hash of a particular key, it 'calls' its hash object. This is preferable to the use of function pointers, as a compiler can better optimize calls to Hash<>::operator()s via inlining etc.

Note that we've given the HashFunc template parameter a 'sensible default'. This means the HashTable should work out the box for common key types. However, the user could provide their own type/object for hashing.

This is the kind of approach that's used in the standard library for containers. For example, std::map<K,V> uses std::less<K> to perform comparisons on keys by default, but you can plug in your own comparison function object if required. It's quite complicated (which is why I didn't mention it until now), but it does offer both flexibility and performance.

#5008716 [C++] Problems designing a proper hash table w/ chaining

Posted by e‍dd on 09 December 2012 - 12:13 AM

Keys are only ever numbers and strings, right?

Those are the most common types, but they're not the only ones. Could be anything, really. You might want to use a 2D coordinate as a key, for example.

Could you clarify what you mean by a "traits mechanism"? I'm unfamiliar with the concept. Perhaps a simple example would clear things up. I do indeed need an overloaded operator==(), only to compare the key held inside the data object though.

Sure. You have a template class like this:

template<typename Key>
struct HashTraits; // only declared, no body for the primary template.

It has no definition, but you can specialize it for types that need to work with the hash table:

struct HashTraits<std::string>
    static std::size_t Hash(const std::string &amp;s) { /* ... */ }
    static bool Equal(const std::string &amp;s1, const std::string &amp;s2) { /* ... */ }

struct HashTraits<int>
    static std::size_t Hash(int i) { /* ... */ }
    static bool Equal(int i1, int i2) { /* ... */ }

// and so on...
EDIT: changed use of uintptr_t to std::size_t.

Then, if the hash table needs to hash an object it does so by calling "HashTraits<Key>::Hash(key)". A similar thing is done for equality comparisons when searching buckets for matching keys. If there's no HashTraits specialization for the Key type available you get a compiler error, indicating that you need to write one. There are some other options in the design space, here. You might have two separate traits templates, one for the hash and one for equality. Or you might forgo the equality part altogether and just rely on the existence of an operator==.

Now, rather than using traits, you could also simply have the requirement that the user must ensure that there's a Hash() function that takes a Key and returns a hash value, rather like you have the operator< requirement for your SortedList. However, some care must be taken as e.g. Hash(const std::string &amp;) might get called even if the Key type is "const char *", due to C++ overload resolution. This may not be the desired behaviour. Using the traits stuff avoids this problem because there has to be a template specialization for the exact type.

So there's a tradeoff between simplicity and correctness.

Do you want to expose the nodes? Or just the values they contain? By returning Node objects, the user can "accidentally" modify any next/prev pointers.

The only reason I can think of to provide my own node type would be for "intrusive links", but your SortedList doesn't seem to allow that (which is fine). Is the Node template parameter necessary?

I can't seem to decide, because I don't have enough experience to know if I would ever need to expose the node or not. When would exposing the node's data object be an issue in game programming? I definitely don't want the 'next' pointer to be handled by anything but the data structure it is serving (i.e. list).

At times, it can be useful to expose the node type (for example, when you've got an intrusive list, as I mentioned).

If you don't have an explicit use-case in mind for the Node, I would hide it for the time being, if possible. As with the protected/private stuff, it's easier to expose stuff at a later date than it is to hide it, given that people may have started to depend on it.

int Size() const;
Is int the best type to use here?

Should I be using something less expensive than that? Short int? Or something with higher resolution?

No, int should be plenty fast. But why not unsigned int? Or std::size_t?

Since hash tables apparently can get pretty crazy in size, I thought int would an okay data type for size, but please explain why you think otherwise. My brain is a sponge right now.

Indeed, any container can get large, depending on what you put in it :) So you want to choose a type that can index the largest possible array you can fit in memory. In other words, you really want an integral type that's the same size as a pointer. On pretty much any modern system in the world, std::size_t will fit the bill.

/** Getter function.
* Returns a ptr to the object stored in the node.
T* Data() const { return &amp;m_data; }
IMHO, this should return a reference. In situations where it's possible for there not to be a value, I can understand the choice to return a pointer. But that's not the case here.

Wouldn't I have to feed the function an external variable then? I can't think of any other way to avoid making the function more complicated. Is the extra parameter just a necessary evil?

I Think we're getting our wires crossed. All I'm really suggesting is that the function be changed to "const T &amp;Data() const { return m_data; }" or "T &amp;Data() { return m_data; }". Actually, now that I look at it again, I'm surprised that the original function even compiles (as there should be a 'const' before the 'T *'). Is this Data() function used anywhere, at present?

Is making ListNode uncopyable just a 'correctness' thing like const, or is there some problem that can arise from it being copyable?

Yes :)
What does it mean to copy a node? If the answer is "that's nonsense", then it would be best to statically disallow it. It's better to be notified of attempted nonsense at compile time than at run time.

by explicit constructors you mean things like the copy and assignment constructors?

No, there's a C++ keyword, "explicit".

At present, I can do weird stuff like this:
void foo(const ChainedHashTable<int, int> &amp;tab); // suppose this function exists

// ...

foo(42); // compiler allows this, as ChainedHashTable's constructor is not explicit!

This is the same mechanism that allows you to pass a "const char *" to a function that takes an std::string. For your constructor, the automatic conversion is probably not wanted!

11 was some arbitrary prime number I chose to start the hash table with.

I see. Presumably the number of buckets is going to change during the life of a hash table, either automatically, or at the user's request(?).

Data* LookUp( Key key ) const;
Should the return type be const-qualified?

I took the const after the function declaration to be a promise to the programmer that the the function will not modify the members of the class...So the fact that I'm returning a non-const pointer to the Data, makes it a non-const function?

Well, let me put is this way. You hand me a hash table, which you say is const. Should I be able to change its content (without resorting to nasty hacks)?

The idiomatic answer for C++ is no, I should not be able to change its content. Sure, the contents of the hash table aren't direct members of the structure, but that sounds like you're just trying to weasel your way out of keeping things const-correct ;)

What often happens is that two overloads are provided for things like this:
const Data* LookUp( Key key ) const;
Data* LookUp( Key key );

std::vector *m_pTable;
Perhaps a typo, but do you really want a pointer to a vector, here?

I guess the general case is to just instantiate the object inside the class.

Yeah, using a pointer is just going to require more book-keeping. Note that if you implement a copy constructor and assignment operator for SortedList, you get them for free if the vector is held by value, rather than by pointer.

EDIT: actually, I now realise a custom assignment operator will still be required if you want to provide the strong exception safety guarantee.

void Resolve( int index ) const;

It's supposed to be my collision resolution function. My plan was to have the function take in the index value where the collision occurred, and then insert the new object being allocated into the list existing at that index.

Up until this point you've got something approaching a very general hash table. So it's a bit weird to have something this specific here, IMHO. Is there any way you can move this specific function outside of the hash table?

I understand that I have much to learn about proper and conventional coding,

That never ends :)

so I'm really glad that you devoted some of your time to help me out. Posted Image

No problem, keep plugging away!

#5008682 [C++] Problems designing a proper hash table w/ chaining

Posted by e‍dd on 08 December 2012 - 09:23 PM

Grrrr. Yet again, I hit "Quote" rather than "Edit". Sorry for the noise.