Jump to content
  • Advertisement
Sign in to follow this  
the_edd

Unity [C++, RFC] data structure for storage of polymorphic objects

This topic is 2612 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

<brain_fart>

class base
{
public:
virtual ~base();
// more polymorphic jazz ...
};

// and a bunch of derived classes


We have to store these as pointers in containers such as std::vector<base *>. Potential negatives:


  • allocation pressure (lots of little allocations compared to std::vector<something_not_polymorphic>)
  • memory fragmentation
  • poor locality e.g. for_each(b, e, bind(&base::do_stuff, _1)) can jump all over the place if do_stuff() operates on member variables.
  • also a bit fiddly (though mitigated by wrappers such as boost::ptr_vector).


    Couldn't we create a container that stores polymorphic objects contiguously in a single buffer to overcome these problems?
    </brain_fart>

    So I've started playing with an experimental data structure that can be used for the same kind of storage, but keeps all the elements in a single buffer. Given the non-uniform sizes and alignments of the elements that might be added, random access can't be achieved without layering some kind of indexing system on top. The most comprehensive container interface that can be implemented with these constraints (as far as I can tell) is that of something approaching std::deque<>, minus the random-access stuff.

    The working title is poly_deque<>. push_back() and push_front() are template methods to ensure that no slicing occurs when the object is copied in to the buffer. I also plan to add emplace_back() and emplace_front() so that elements can be constructed in the poly_deque's buffer without the need for copying. I've got the basics implemented and in one benchmark (out of only one so far) that uses push_back() and pop_front(), I've got a 33% speedup over std::deque<base *> (msvc 2010, cl /EHsc /Ox /DNDEBUG=1 /D_SECURE_SCL=0).

    Before I go much further, I'd like to get a little bit of feedback on the current design. Please take this opportunity to question my sanity if you think it appropriate, as I'd like to avoid wasting too much time on this if it's a daft idea (though a large component of why I'm doing this is just for 'fun', anyway. You should see me at parties).

    So here's what the buffer looks like after having push_back()ed a few objects:

    gallery_121204_183_16057.png

    The pink parts are pointers to statically allocated "vtable" structures and the green parts are the data for the actual objects. Each object has a vtable pointer on either side of it. Here's what's in a vtable:


    struct vtable
    {
    std::size_t size; // size of object
    std::size_t alignment; // boost::alignment_of<> object
    void (* const place_copy)(void *dst, const void *src); // use placement new to copy object at src to dst
    };


    Given that push_back() and push_front() are template functions, we can get a suitable vtable at that point using the following trick:


    template<typename T>
    struct vtable_for_type
    {
    static void place_copy(void *dest, const void *src)
    {
    new (dest) T(*static_cast<const T*>(src));
    }

    static const vtable table;
    };

    template<typename T>
    const vtable vtable_for_type<T>::table = {
    sizeof(T),
    boost::alignment_of<T>::value,
    &vtable_for_type<T>::place_copy
    };

    // ...

    template<typename Base>
    class poly_deque
    {
    public:
    // ...

    template<typename Derived>
    void push_back(const Derived &d)
    {
    const vtable *vt = &vtable_for_type<Derived>::table;
    // ...
    }

    // ...
    };


    The vtables are used to help with iteration (given a 'left' vtable, you know how to get to the next one) and copying elements should a buffer reallocation be needed. Two vtables per object are needed to support bidirectional iteration; if there was only a vtable to the left of each object, only forward iteration would be possible (I think).

    Storage is cyclic, so for example, push_front() might put an element at right end of the buffer if there's no space at the left end:

    gallery_121204_183_2742.png

    When this 'wrapping' behaviour occurs, it's possible that a little bit of space is wasted at the front. In this case, it's important to make a note of where the left-most element starts, to aid reverse iteration:

    gallery_121204_183_2781.png

    I don't much care for this extra complication and one of the things I'd like to ask is if anyone can think of a simpler scheme for storage of polymorphic in a single buffer? Iteration is more expensive than I'd like it to be. For example, to advance to the next element, given its left vtable pointer:


    char *left_vtable = [somewhere in the buffer]
    const vtable *vt = *reinterpret_cast<const vtable * const *>(left_vtable);
    char *element = advance_to_alignment(left_vtable + sizeof (vtable*), vt->alignment);
    char *right_vtable = advance_to_alignment(element + vt->size, alignment_of<vtable *>::value);
    char *next_left_vtable = right_vtable + sizeof (vtable *);

    return (next_left_vtable == right_bound) ? left_bound : next_left_vtable;


    I'll readily admit that there's a strong element of "solution looking for a problem" here, but I'm curious if if anybody could see themselves using something like this in the real world?

    If anybody's interested in the prototype code, I'll happily put it up somewhere.

Share this post


Link to post
Share on other sites
Advertisement
I will not touch code that pretends to know about vtable existing.

There is also another rule: If you can't explain an algorithm in one sentence it's too complicated.


It's probably a fun technical challenge...

I've got a 33% speedup over std::deque<base *> (msvc 2010, cl /EHsc /Ox /DNDEBUG=1 /D_SECURE_SCL=0).[/quote]

If performance is an issue:struct Container {
vector<A *> a;
vector<B *> b;
...
vector<Z *> z;
}

void for_each(...) {
std::for_each(a.begin(), a.end(), f);
std::for_each(b.begin(), b.end(), f);

std::for_each(z.begin(), z.end(), f);
}
No magic and the gains will be much larger since no polymorphism is involved even if classes are polymorphic, meaning f can be inlined. Or one could probably use that template-with-template-parameter to hard-code the type passed to f.

There is alternate implementation using a switch and indexed/cursor linked list which can preserve order of elements, but in that case it's not worth it, since call to f is inherently polymorphic.


But it's just a general problem of mapping OO hierarchy to tabular model (SQL or such). You either preserve the hierarchy and have many distinct tables (my approach) or you have one single table with all possible columns (your approach). Neither is perfect, I just prefer one that isn't based around inherently undefined or compiler-specific behavior.

Share this post


Link to post
Share on other sites
For a less generic (but still pretty silly) solution, our engine includes some container classes (written by another studio, inherited by us). Those include versions of things like vector that work on hierarchies, where each slot is sized to the largest class. All accessor functions return pointers to the base type (so you still have some interface type-safety), but you need to initialize the thing with sizeof(LargestDerivedObject). And of course, you're wasting a bunch of memory if there's one particular type that's much larger than the others.

This is not a recommendation or endorsement. cool.gif

Share this post


Link to post
Share on other sites

I will not touch code that pretends to know about vtable existing.

I understand your repulsion, but give it a closer read and you'll see it's not doing that at all. The "vtable" structure is my own creation, named because it mimmicks the role of a real vtable.


There is also another rule: If you can't explain an algorithm in one sentence it's too complicated.
[/quote]
Which algorithm?



If performance is an issue:struct Container {
vector<A *> a;
vector<B *> b;
...
vector<Z *> z;
}

void for_each(...) {
std::for_each(a.begin(), a.end(), f);
std::for_each(b.begin(), b.end(), f);

std::for_each(z.begin(), z.end(), f);
}

[/quote]
This still has poor locality (potentially), one of the problems I was trying to overcome. Perhaps you meant containers of non-pointers? In which case, that's indeed a potential solution assuming you know all the types upfront. You've also lost the ordering and ability to use the data structure as a queue. EDIT: but just got your comment about indexing.

I just prefer one that isn't based around inherently undefined or compiler-specific behavior.[/quote]
Again, no undefined or compiler-specific behaviour here. Pretend if you will that I called my "vtable" structure "metadata" instead. It started out with more function pointers, so the name is less applicable than it once was.

Share this post


Link to post
Share on other sites

Perhaps you meant containers of non-pointers?
Yep.

but just got your comment about indexing.[/quote]
Since you know all the types upfront, you maintain an in-place index-based linked list, then select the container on which to invoke. Linked list is then just a tuple of (next, container index). It then becomes something like linked list over a 2D array (type vs. index). Not sure if it's much better.

I understand your repulsion, but give it a closer read and you'll see it's not doing that at all. The "vtable" structure is my own creation, named because it mimmicks the role of a real vtable.[/quote]
Possibly.

My main concern here is that despite performance benefits, the main cost of polymorphism comes from not knowing what follows which has more of an impact on choice of algorithms than anything else.

Share this post


Link to post
Share on other sites
There's something else that's been bugging me about the whole "needs to know all types".

If I give you a Base *, Base & or const Base &, how do you know what type it is, how large it is, whether it's multiply or virtually inherited? After all, there is no way to pass polymorphic types by value, so template doesn't work.

I simply don't see any way to make a copy without having a non-standard virtual clone() function.


This leaves only one option, where such list is only usable if instances are added at creation point, respecting the contract of passing in properly typed instance which is somewhat limiting. Or better yet, can be adequately solved using custom pool allocator and perhaps intrusive or other smart pointers. The problems of reallocating instances remains the same - if one isn't allowed to move them after creation, holes will be made in pool (or container). If one can (perhaps in C with PODs), then the problem becomes somewhat trivial block storage reallocation problem.

Share this post


Link to post
Share on other sites

I simply don't see any way to make a copy without having a non-standard virtual clone() function.

This leaves only one option, where such list is only usable if instances are added at creation point, respecting the contract of passing in properly typed instance which is somewhat limiting.


Yeah, this is what's currently in place and I also happen to find it irksome.

There's another alternative, which would be to have poly_deque detect if the user has defined a "get_vtable()" function for their hierarchy:


const vtable *get_vtable(MyBaseClass *) { /* ... */ }


Once the container has such a vtable, it's "safe", but again not very user friendly. Getting the objects out of the container in a usable form would probably involve a clone() method or similar, too.


Or better yet, can be adequately solved using custom pool allocator and perhaps intrusive or other smart pointers.
[/quote]
A pool allocator would likely reduce locality problems, but there's still overhead of numerous little allocations (which is what I believe is the source of improvement seen in some of my early benchmarks).

The problems of reallocating instances remains the same - if one isn't allowed to move them after creation, holes will be made in pool (or container). If one can (perhaps in C with PODs), then the problem becomes somewhat trivial block storage reallocation problem.
[/quote]

Yes, it's probably unrealistic to expect your average polymorphic object to sit in a container for its entire life. The only workaround for this is to have an entire ecosystem that supports copying/moving polymorphic objects between containers of various kinds (which I'd rather not get in to :)).

So I think it's probably safe to say that even if poly_deque solves a few problems, there are bunch of awkward new ones to contend with and ultimately this probably isn't worth exploring.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!