Sign in to follow this  
Mco

3D Vector class and memory alignment

Recommended Posts

Mco    130
Currently I have a vector class that comprises of 3 floats for 3 spatial components. This results in the class have a size of 12 bytes. However would the class be more efficient if I add an extra four bytes to pad the class out to 16 bytes? Would this make caching more efficient? Also, if this is the case, would any memory allocation result in the first byte being aligned correctly in memory? As I guess that if you have an array of objects that have been padded out to an efficient size won't matter if the starting byte is not aligned in memory. This seems to be dependent upon architecture and OS so answers in the context of running Win Xp on a Core 2 duo would be appreciated :)

Share this post


Link to post
Share on other sites
Martin    194
Unless you're using SSE or similar it is more cache efficient to use smaller structures, i.e. 12 bytes is better than 16

If you are using SSE then 16 byte structures as 16 byte alignment is the preferred method. SSE can read less than 16 byte alignments but I belive this is slower.

If you are crtically concerned about cache performance having the entire vector on the same cache line might be of some limited benefit, say using the minimum number of cache lines for bone matrices in a software skinning solution.

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Original post by Mco
Currently I have a vector class that comprises of 3 floats for 3 spatial components. This results in the class have a size of 12 bytes.

However would the class be more efficient if I add an extra four bytes to pad the class out to 16 bytes? Would this make caching more efficient?


It also reduces available cache by 25%, and bandwidth is just as important.

Ultimately, it will depend on the algorithm. This is the area of micro-optimization, which you perform once you have everything else completed, since results may vary depending on simple order of two instructions.

Overall, such optimizations are likely impossible to predict in advance for PCs, simply because chipsets vary so much. Since such optimization is trivial (single #ifdef inside a struct.

Profile, profile, profile. When done, profile some more. This is not academic topic, but simply a completely architecture-based optimization. The answers for specific platforms, such as consoles however might be different, but likely only with regard to specific algorithms, but even then, wasting 25% of memory is likely a big deal for such trivial optimization.

Share this post


Link to post
Share on other sites
haegarr    7372
Quote:
Original post by Mco
Also, if this is the case, would any memory allocation result in the first byte being aligned correctly in memory? As I guess that if you have an array of objects that have been padded out to an efficient size won't matter if the starting byte is not aligned in memory.

Its also a question what "any memory allocation" actually means. An OS (and I think XP makes no difference here) doesn't manage the memory bytewise but in blocks. This means that even if you request a single byte only, in fact an entire block is used (multiples of full blocks in general). You can try a simple test:
struct Ab { float a,b; };

::printf("%p\n",new Ab);
::printf("%p\n",new Ab);
Presumbly the difference of the both values shown will be 16 (as said, I don't know it for XP), although the size of the struct instances is 8 bytes only. Hence allocating distinct objects from this source will ever be aligned to those block boundaries. But if you allocate an array
new Ab[2];
the both objects will be densely packed together.

Share this post


Link to post
Share on other sites
SiCrane    11839
If you request memory directly from XP via the Windows API heap functions, there will be 24 to 31 bytes of bookkeeping information/heap consistency information/padding up to a boundary of 8 bytes. So if you ask for 1 to 8 bytes it'll use 32 bytes to fulfill that request. If you ask for 9 to 16 bytes, it'll use 40 bytes to fulfill that request. However, new doesn't necessarily call the Windows API memory management routines directly. In MSVC, it depends both on compiler flags and the state of certain environment variables at program start up.

In any case, the memory management routines malloc() and ::operator new() are guaranteed to return memory that's properly aligned for any primitive type.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this