Design question: iterators

Started by
4 comments, last by tomva 14 years, 6 months ago
Hello all- Random design question for you all. I'd be curious to see how others have solved this! I've got objects that manage collections of things. For instance, an object that manages players, another that manages users, another that manages maps, another that manages physics objects, etc. Each of these "Manager" objects has to be threadsafe. Multiple threads can be adding/removing objects all the time. At the same time, other threads are trying to iterate. One thread may be iterating over players while another is adding a player, etc. To support this, each Manager object exposes an iterator (or multiple iterators if the Manager contains different kinds of objects--for instance the Physics manager lets you iterate physics objects and collisions). You can ask the Manager for an iterator, and then use the iterator and the Manager to walk through all objects available. This lets clients iterate in a threadsafe manner. So far, so good. But I have two additional constraints (take these as a given, I'd rather not go into these right now): First, the header files are silent on implementation. That is, they expose an interface, including the iterators and iteration methods, but know nothing about internal data structures etc. (This is all in C++) Second, clients must be able to allocate iterators on the stack. Iteration is common and should be lightweight, not requiring memory allocation. Because the header doesn't expose any implementation details, I don't know the real size of an iterator. For now, I do something like this:
// header file

class Manager {
public:
    struct iterator_t {
        char opaque[4 * sizeof(long)];   // best guess!
    };

    virtual void resetIterator(iterator_t& i) = 0;

    virtual bool getNextObject(iterator_t& i, object_t& o) = 0;

    // other methods...
};
And then the corresponding cpp file for that header is:
// cpp source file
#include (header)

class ManagerImpl : public Manager {
    struct real_iterator_t {
        collection_t::iterator    iter;
        int                       rvn;
        void *                    otherData;
    };

    void resetIterator(iterator_t& i) {
        real_iterator_t ri = (cast) i;
        ri.iter = m_collection.begin();
        ri.rvn = m_rvn;
        ri.otherData = "foobar";
    }

    // and so on...
};
As you can see, I have to cast from the opaque (hopefully big enough!) public iterator to the actual private iterator in the C++ source file. Is there a better way to do that? And in general is there a better way to expose public iterators on Managers? I'm guessing this is a pattern that's been hit before.
Advertisement
Quote:Original post by tomva
Random design question for you all. I'd be curious to see how others have solved this!

I've got objects that manage collections of things. For instance, an object that manages players, another that manages users, another that manages maps, another that manages physics objects, etc.

Each of these "Manager" objects has to be threadsafe. Multiple threads can be adding/removing objects all the time.

The first thing I would do to solve this is to stop doing this. I wouldn't have an arbitrary number of threads adding and removing stuff from the same collections. That's both error-prone and slow.

Quote:So far, so good. But I have two additional constraints (take these as a given, I'd rather not go into these right now):

Ok, but they're a pretty unreasonable pair of constraints in C++!

Quote:As you can see, I have to cast from the opaque (hopefully big enough!) public iterator to the actual private iterator in the C++ source file.

Is there a better way to do that?

If the information isn't there, it isn't there. Should one object with a complex data structure require a 256-byte iterator in the future you're screwed. I expect that what you've got is as good as it gets. If you're not willing or able to allocate objects on the heap then the usual polymorphism via virtual functions isn't available to you.

The Manager class could take the iterator_t type as a template argument, allowing you to discard the generic 'base class', perhaps.

Quote:And in general is there a better way to expose public iterators on Managers? I'm guessing this is a pattern that's been hit before.

I'm guessing it has been hit before, but most people would be able to avoid the pathologically poor restrictions you've got to work with. 90% of the time I'd return a begin/end pair that you could use standard library algorithms with. If I needed multithreaded access I'd probably not be handing out iterators at all - any iteration over members of the manager would be done within the manager.
Quote:The first thing I would do to solve this is to stop doing this.


Fair point! And with software, that is often the best solution.

Originally I did not use this iteration pattern, and instead I had the Managers iterate internally. But then you have two choices:

a) Have the Managers iterate, and provide a callback so clients can perform per-object functions, or

b) Have the Managers iterate with no callbacks, so the manager has to add a new function every time a client needs a new iteration behavior.

As a concrete example, suppose my server code needs to iterate over hosts to set up and send UDP packets. With option b), I'd have to add that UDP construction code to the Host Manager. And any other code that needed to iterate over hosts for a different reason would then mean I (or someone else) would have to add yet another method to the Host Manager class to support that new behavior.

Clearly, option a) (having internal iteration with per-object callbacks) is more elegant and more common. However, I found that would open the door for deadlocks. Either I hoped that the callback wouldn't do anything re-entrant (not really maintainable) or else I had to release and re-take the mutex before and after the callback.

Once you start having to mess with the mutex on a per-callback basis, I ended up liking external iterators more. It is also an easier paradigm for the caller, like the begin/end functions you mention.

So I may stick with basically what I've got, but move to the more common begin/end model.
Quote:Original post by tomva

As a concrete example, suppose my server code needs to iterate over hosts to set up and send UDP packets.


There is very little reason for this to be multi-threaded.

Some math.
Let's have 500 clients.
100Mbit connection equals some 20k realistic packets per second (both, in and out).
This comes out to average packet size of 500 bytes or so.
Time to send 500 bytes over 100Mbit connection is around 0.05 ms.

But, since we only have 500 clients, each client will receive, on average, 40 packets per second. 40 x 0.05 = 2ms.

Conclusion: you have 2 whole milliseconds to service each client.

If you dedicate a single core to handling these packets, it should run at some 10-20% utilization.

The numbers above are quite representative, and the numbers are quite high.
It sounds like you're using parallelism in all the wrong ways. Using additional threads is not necessarily going to improve performance...from what it sounds like you are trying to do, your parallel performance may actually be worse than single threaded performance.

On to your question, there is no need to have thread safety on your manager objects, because you shouldn't have multiple threads trying to add and remove from them simultaneously. Instead, fix your design so that things which can be done simultaneously are done in parallel, and things which cannot be done simultaneously are minimized...but not parallelized.
Great answers! I will first go back and do some deep thinking on my design. The general principle (avoid detailed locking altogether by a combination of parallelism and a small amount of forced serialization) is pretty compelling. It would mean some significant refactoring but I see the benefits and I think it could get to one of my core concerns: making this (multithreaded) code base maintainable in an open source arena.

Here's another specific use case, how should this work?

  • I have a MapManager that manages all the loaded maps.
  • Other components can request that the MapManager load a new map, or release an unused map.
  • The request to load is queued and is actually handled on a background thread for obvious reasons.

Clearly, many components (game logic, server-to-client state manager, etc.) will need to iterate maps. How do you safely coordinate that iteration with the background threads that will occasionally pop up and ask that a newly loaded map be made available? I know how to do that with threadsafe iterators, but how would you do this in the parallelized/minimally-serialized approach?

By the way, I'm not really invoking multithreading for performance reasons. Antheus: my expected (targeted use case) client count is closer to 20 than 500! So there should be ample CPU overhead. Most of my CPU is the physics for (expected) large-ish maps, which is serial (or parallelized as the physics engine sees fit--I'm using Bullet). The only multithreading is for the few threads that will be performing disk i/o or performing background admin tasks. I expect that to be around 2-3 other threads, not dozens. But obviously if you have more than one thread you have to be absolutely bulletproof. Hence my paranoia and design for pathological cases.

This topic is closed to new replies.

Advertisement