Jump to content
  • Advertisement
Sign in to follow this  
DaMuzza

alignment of member variables

This topic is 4584 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Code like my Matrix class want to be aligned for SSE usage. So it may be declared something like this: __declspec(align(16)) class CMatrix { __m128 m1,m2,m3,m4; }; In normal usage, this is fine and allocating a matrix will ensure that it is aligned to 16 bytes in memory and you can use aligned SSE instructions on m1,etc. However, if I have a class which stores a matrix, like so: class CTransform { .. various data .. CMatrix mXForm; }; then mXForm may NOT be aligned depending on the data that comes before it in the class. Of course, I can byte count the preceding variables (including vtables sometimes!) and reposition mXForm to be aligned, although this becomes pretty painful if you have to do it a lot. However, even that doesn't solve the following problem: CTransform* data = new CTransform[ 100 ]; Unless sizeof(CTransform) is aligned to 16 bytes, then data[1] and so on is going to have a unaligned pointer to mXForm. Again, with structure padding, this can be solved, but it can get real messy. I've been living with this kind of thing for years, and thought I should ask to see what better solutions people may have. Currently, my approach is to use a form of CMatrix that is unaligned for MOST uses, and then use an aligned one only in very specific situations where I know I need the performance and I can live with the time and mess required doing the byte counting. Is this the best way? Does anyone have any genius solutions to this kind of thing? Thanks

Share this post


Link to post
Share on other sites
Advertisement
It seems you are making a lot of extra work for yourself. Have you read the documentation on __declspec(align(#))?

Of particular interest to you will be these passages:
Quote:

Note that sizeof(struct Str1) is equal to 32, such that, if an array of Str1 objects is created, and the base of the array is 32-byte aligned, then each member of the array will also be 32-byte aligned. To create an array whose base is properly aligned, use _aligned_malloc, or write your own allocator. Note that normal allocators, such as malloc, C++ operator new, and the Win32 allocators return memory that will most likely not be sufficiently aligned for __declspec(align(#)) structures or arrays of structures.

Quote:

In the following example, the S1 structure is defined with __declspec(align(32)). All uses of S1, whether for a variable definition or other type declarations, ensure that this structure data is 32-byte aligned. sizeof(struct S1) returns 32, and S1 has 16 padding bytes following the 16 bytes required to hold the four integers. Each int member requires 4-byte alignment, but the alignment of the structure itself is declared to be 32, so the overall alignment is 32.


In other words, the compiler generates the padding you've been struggling with trying to do manually. You can verify this in your debugger's memory window. As the docs state, you may need to create your own allocator though to align the base of heap-allocated arrays.

Share this post


Link to post
Share on other sites
You shouldn't have any issues.

In regard to the first issue, member alignment in a class, this is a non-issue on any fairly modern compiler, if not every compiler, I would expect. I know VC2003 handles it correctly as I just checked. I created a class that required 128-byte alignment. I created another class with 2 members - a char and the first class. sizeof returned that the second class was 256 bytes large, which would be the minimum size necessary to support the layout properly.

As far as newing goes, it *should* work. I haven't tested this as a guarantee. The standard require that new return a pointer to memory that meets all the alignment requirements of the newed type; therefore, new should handle any alignment issues under the hood. Although it's supposed to, you might want to test to make sure it works properly on your platform.

Share this post


Link to post
Share on other sites
Quote:
Original post by Troll
As far as newing goes, it *should* work.


My experience has been that it actually *doesn't* work. There has been nothing I could do to prevent the MSVC 2003 'new' from aligning on only 8-byte boundaries. This means that you might accidentally get 16-byte aligned boundaries for awhile... but suddenly you won't. No combination of project settings could change the fact that dynamic memory would only ever be allocated on 8-byte boundaries.

Finally I compiled the dlmalloc library, editing in 16-byte alignment, and I use that in my allocators and new calls. Works like a charm.

The fact is that when MSVC goes to actually do the memory allocation, there aren't arguments or separate overloads to handle various alignment cases. Hopefully MSVC 2005(6?) defaults to 16-byte alignment when allocating.

Share this post


Link to post
Share on other sites
Quote:
Original post by ajas95
Finally I compiled the dlmalloc library, editing in 16-byte alignment, and I use that in my allocators and new calls. Works like a charm.


How the heck did you do that?
I am realy interested, so could you elaborate some more on that? Where does the source come from? Does it need to replace the default dll (system-wide), or could it be used locally to your program?

Share this post


Link to post
Share on other sites
You can read about and download dlmalloc here. There's a #define called MALLOC_ALIGNMENT that I just set to 16 (and did some other tweaking, like pointing it to use AMD's optimized memcpy_amd routine).

Then I created an STL allocator to use this:

#ifndef _DL_ALLOCATOR_H_
#define _DL_ALLOCATOR_H_

template<typename T>
class dl_allocator
{
public:
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
typedef T value_type;

template<typename T1>
struct rebind
{
typedef dl_allocator<T1> other;
};

dl_allocator() throw() { }

dl_allocator(const dl_allocator&) throw() { }

template<typename T1>
dl_allocator(const dl_allocator<T1>&) throw() { }

~dl_allocator() throw() { }

pointer address(reference __x) const
{
return &__x;
}

const_pointer address(const_reference __x) const
{
return &__x;
}

pointer allocate(size_type __n, const void* = 0)
{
pointer __ret = static_cast<T*>(dlmalloc(__n * sizeof(T)));
if (!__ret)
throw std::bad_alloc();
return __ret;
}

// __p is not permitted to be a null pointer.
void deallocate(pointer __p, size_type)
{
dlfree(static_cast<void*>(__p));
}

size_type max_size() const throw()
{
return size_t(-1) / sizeof(T);
}

void construct(pointer __p, const T& __val)
{
::new(__p) value_type(__val);
}

void destroy(pointer __p)
{
__p->~T();
}
};

template<typename T>
inline bool
operator==(const dl_allocator<T>&, const dl_allocator<T>&)
{
return true;
}

template<typename T>
inline bool
operator!=(const dl_allocator<T>&, const dl_allocator<T>&)
{
return false;
}

#endif



And finally, I remember the compiler kept bitching about not being able to guarantee alignment on the stack, so I copy-pasted std::vector into aligned_vector, and changed the function that took its arguments by value to reference.

So, then calls are like:

typedef std::aligned_vector<vec4, dl_allocator<vec4> > Vert_list;
typedef std::aligned_vector<AABB, dl_allocator<AABB> > AABB_list;
typedef std::aligned_vector<Vert_data, dl_allocator<Vert_data> > Vert_data_list;


In terms of how to cause "new" to also call this allocator, you can do it in a couple ways. You can either do it for each class by overloading the new operator, or you can make *everything* use this allocator by overloading global new. Both of these techniques are explained here. A third technique is to allocate your own memory and then use 'placement new' to cause the object to be constructed in the location you specify.

Share this post


Link to post
Share on other sites
Quote:
Original post by ajas95
You can read about and download dlmalloc here. There's a #define called MALLOC_ALIGNMENT that I just set to 16 (and did some other tweaking, like pointing it to use AMD's optimized memcpy_amd routine).


Oh, you mean using your own library to redirect mallocs manually. I was hoping for an another version of CRT in a dll. Since I'm mainly building components (dlls), not whole applications, I cannot overload global new, since I'm not in control of all modules of an app. My only option would be to have a whole another stack just for 16-aligned alllocations, and that is problematic, and seems to me like too much of a hassle.

A standard way, like a whole new dll, which you mentioned, would be perfect, but until then...

Quote:

Then I created an STL allocator to use this:
*** Source Snippet Removed ***

And finally, I remember the compiler kept bitching about not being able to guarantee alignment on the stack, so I copy-pasted std::vector into aligned_vector, and changed the function that took its arguments by value to reference.


Thanks for that info! I implemented the 16-aligned allocator myself quite recently, but I failed to use it with std::vector, just for the reason you're mentioning. Now I may be able to get back to it and make it work, I hope.


Oh, and so I'm not just "hijacking the thread", here is a recent discussion on a very similar topic.

Share this post


Link to post
Share on other sites
Quote:
Original post by deffer
Oh, you mean using your own library to redirect mallocs manually. I was hoping for an another version of CRT in a dll. Since I'm mainly building components (dlls), not whole applications, I cannot overload global new


It sounds like you need to overload operator new on a class-by-class basis to redirect to a 16-byte aligned allocator. It's a little bit of a pain but not so bad. Overloading global new has its own peculiarities anyway! (beware your linking order... ;)

Quote:

Oh, and so I'm not just "hijacking the thread", here is a recent discussion on a very similar topic.


Well, it's unbelievable to me that this is actually such a pain. I remember being really pissed when I was putting all this together-- all I wanted was a stupid std::vector of 16-byte aligned types! Little did I know how crazy it would be just to allocate classes with alignment requirements. Shocking, truly.

DaMuzza, if you ever find a 'genius solution', please let me know!

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!