3D Vector class and memory alignment

Started by
4 comments, last by SiCrane 15 years, 4 months ago
Currently I have a vector class that comprises of 3 floats for 3 spatial components. This results in the class have a size of 12 bytes. However would the class be more efficient if I add an extra four bytes to pad the class out to 16 bytes? Would this make caching more efficient? Also, if this is the case, would any memory allocation result in the first byte being aligned correctly in memory? As I guess that if you have an array of objects that have been padded out to an efficient size won't matter if the starting byte is not aligned in memory. This seems to be dependent upon architecture and OS so answers in the context of running Win Xp on a Core 2 duo would be appreciated :)
Advertisement
In most cases, your compiler will do the padding for you unless you explicitly tell it not to.
Mike Popoloski | Journal | SlimDX
Unless you're using SSE or similar it is more cache efficient to use smaller structures, i.e. 12 bytes is better than 16

If you are using SSE then 16 byte structures as 16 byte alignment is the preferred method. SSE can read less than 16 byte alignments but I belive this is slower.

If you are crtically concerned about cache performance having the entire vector on the same cache line might be of some limited benefit, say using the minimum number of cache lines for bone matrices in a software skinning solution.
Cheers,MartinIf I've helped you, a rating++ would be appreciated
Quote:Original post by Mco
Currently I have a vector class that comprises of 3 floats for 3 spatial components. This results in the class have a size of 12 bytes.

However would the class be more efficient if I add an extra four bytes to pad the class out to 16 bytes? Would this make caching more efficient?


It also reduces available cache by 25%, and bandwidth is just as important.

Ultimately, it will depend on the algorithm. This is the area of micro-optimization, which you perform once you have everything else completed, since results may vary depending on simple order of two instructions.

Overall, such optimizations are likely impossible to predict in advance for PCs, simply because chipsets vary so much. Since such optimization is trivial (single #ifdef inside a struct.

Profile, profile, profile. When done, profile some more. This is not academic topic, but simply a completely architecture-based optimization. The answers for specific platforms, such as consoles however might be different, but likely only with regard to specific algorithms, but even then, wasting 25% of memory is likely a big deal for such trivial optimization.

Quote:Original post by Mco
Also, if this is the case, would any memory allocation result in the first byte being aligned correctly in memory? As I guess that if you have an array of objects that have been padded out to an efficient size won't matter if the starting byte is not aligned in memory.

Its also a question what "any memory allocation" actually means. An OS (and I think XP makes no difference here) doesn't manage the memory bytewise but in blocks. This means that even if you request a single byte only, in fact an entire block is used (multiples of full blocks in general). You can try a simple test:
struct Ab { float a,b; };::printf("%p\n",new Ab);::printf("%p\n",new Ab);
Presumbly the difference of the both values shown will be 16 (as said, I don't know it for XP), although the size of the struct instances is 8 bytes only. Hence allocating distinct objects from this source will ever be aligned to those block boundaries. But if you allocate an array
new Ab[2];
the both objects will be densely packed together.
If you request memory directly from XP via the Windows API heap functions, there will be 24 to 31 bytes of bookkeeping information/heap consistency information/padding up to a boundary of 8 bytes. So if you ask for 1 to 8 bytes it'll use 32 bytes to fulfill that request. If you ask for 9 to 16 bytes, it'll use 40 bytes to fulfill that request. However, new doesn't necessarily call the Windows API memory management routines directly. In MSVC, it depends both on compiler flags and the state of certain environment variables at program start up.

In any case, the memory management routines malloc() and ::operator new() are guaranteed to return memory that's properly aligned for any primitive type.

This topic is closed to new replies.

Advertisement