• entries
    58
  • comments
    218
  • views
    114346

Allocations, Revisited

Sign in to follow this  

342 views

Some of you may remember my entry on the custom allocator we wrote to try to take advantage of stack allocation for small temporary arrays. Yes, it was Premature Optimization, and as no good deed goes unpunished, we botched the job and caused all sorts of problems. We ended up scrapping the whole thing and just using std::vector for our temporary array needs.

A few weeks ago, we got an issue filed on our tracker that claimed that a particular shader method in D3D10, which was called many times per frame, was spending 80% of its time diddling with std::vector. I was skeptical to be sure, but a few quick tests proved that at least a good portion of the method was in fact being eaten up. I figured it high time I resurrected our old stack allocation code, but this time I had the benefit of hind sight to help guide me along.

Our old attempt had basically involved using a custom allocator for std::vector, which seemed good on the surface until we realized that allocated memory from the stack inside an allocator wasn't going to be of much use outside of it. To that end, I began thinking of ways that we could reliably allocate memory on the stack of the calling method, but still wrap it all up nicely so that the user didn't have to worry about cleanup or any other nonsense. I hit upon the idea of using a macro that would discreetly allocate a chunk from the stack and then forward the call on to the actual stack_array constructor. Since the macro is a simple text replacement, the actual stack allocation call happens in the calling function, which is exactly where it needs to happen.

You can see the entire contents of the stack_array class here:

#define stackalloc(type, length) stack_array::from_stack_ptr(reinterpret_cast(_malloca(sizeof(type) * length)), length)

template
struct stack_array_ref
{
explicit stack_array_ref(T *right, size_t length, bool on_stack)
: ptr(right),
len(length),
on_stack(on_stack)
{
}

T *ptr;
size_t len;
bool on_stack;
};

template
class stack_array
{
private:
T* ptr;
size_t len;
bool on_stack;

explicit stack_array(T* memory, size_t length) throw()
: len(length),
ptr(memory),
on_stack(true)
{
}

public:
explicit stack_array(size_t length = 0) throw()
: len(length),
ptr(new T[length]),
on_stack(false)
{
}

stack_array(stack_array& right) throw()
: ptr(right.ptr),
len(right.len),
on_stack(right.on_stack)
{
right.ptr = NULL;
right.len = 0;
right.on_stack = false;
}

stack_array(stack_array_ref right) throw()
{
ptr = right.ptr;
len = right.len;
on_stack = right.on_stack;

right.ptr = NULL;
}

~stack_array()
{
if (on_stack)
_freea(ptr);
else
delete[] ptr;
}

static stack_array from_stack_ptr(T* memory, size_t length)
{
return stack_array(memory, length);
}

operator stack_array_ref() throw()
{
stack_array_ref ans(ptr, len, on_stack);
ptr = NULL;
len = 0;
on_stack = false;

return ans;
}

stack_array& operator = (stack_array& right) throw()
{
if (right.ptr != ptr)
{
if (on_stack)
_freea(ptr);
else
delete[] ptr;
}

ptr = right.ptr;
len = right.len;
on_stack = right.on_stack;

right.ptr = NULL;
right.len = 0;
right.on_stack = false;

return *this;
}

stack_array& operator = (stack_array_ref right) throw()
{
if (right.ptr != ptr)
{
if (on_stack)
_freea(ptr);
else
delete[] ptr;
}

ptr = right.ptr;
len = right.len;
on_stack = right.on_stack;

return *this;
}

const T* get() const
{
return ptr;
}

T* get() throw()
{
return ptr;
}

size_t size() const throw()
{
return len;
}

T& operator [] (size_t index)
{
return ptr[index];
}

const T& operator [] (size_t index) const
{
return ptr[index];
}
};



It's a very lightweight template class that really only exists to hold temporary values while we marshal between .NET and DirectX. Notice the stackalloc macro, which is where the magic happens. If the user fails to use this macro to set up the array, it will go ahead and use a standard new/delete, which means we don't get unspeakable errors from a simple typo. Here's an example of using it:

stack_array d3dpp = stackalloc( D3DPRESENT_PARAMETERS, presentParameters->Length );



I'm pretty happy with the way it turned. Benchmarks place stack_array at around 3x faster than std::vector, and even slightly faster than raw memory allocation, so we've definitely done some good work there. I'm not sure why std::vector is so slow in this case; I've turned off every security and debugging feature I can think of; maybe there's some quirk when it comes to using it in C++/CLI.
Sign in to follow this  


0 Comments


Recommended Comments

There are no comments to display.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now