Matrix 16 byte alignment

Started by
12 comments, last by Ravyne 7 years, 5 months ago
These claims are both misleading.

Dropping support of x86 in favor to x64 might solve a lot of headaches

x64 aligns data by 16 bytes, x86 - by 8.


The default alignment is entirely implementation-defined. There is no guarantee that allocations are 16-byte aligned just because you're on the x64 architecture. There are absolutely platforms that use 8-byte new/malloc alignment even when compiled to x86_64.

OSX/iOS - guaranteed 16-byte alignment no matter which architecture is targeted.
Microsoft - guaranteed 16-byte alignment on x64, 8-byte alignment on x32.
Linux/Android - guaranteed 8-byte alignment only.
Other platforms - not sure off the top of my head.

Sean Middleditch – Game Systems Engineer – Join my team!

Advertisement

Linux/Android - guaranteed 8-byte alignment only.

The GNU libc guarantees 16-byte alignment on x64 (http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html).

Linux/Android - guaranteed 8-byte alignment only.


The GNU libc guarantees 16-byte alignment on x64 (http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html).


I stand corrected. That hadn't been the case in the past.

I'd still be wary of relying on the behavior, though.

It's probably worth noting that just installing a custom allocator also solves the problem. Most serious games engines I've used will drop in a custom allocator that makes alignment guarantees, either with a flat 16-byte alignment guarantee or a 16-byte alignment for blocks at lesst 16 bytes in size guarantee.

Sean Middleditch – Game Systems Engineer – Join my team!

Another potential performance impact of unaligned vectors and matrices is that your data can cross a cache-line boundary, increasing cache spills and potentially wasting precious memory bandwidth. A 4x4 single-precision matrix fills a cacheline exactly on most current architectures, so you might consider aligning static/long-lived matrices on 64-byte boundaries even. For 4-wide single-precision vectors, aligning on 16-byte addresses relieves the potential to cross cache-line boundaries which, in the worst-case scenario, can cause your program to read 128 bytes of data to use only 16 bytes of it (though, you probably shouldn't be operating on single small vectors anyways); it could also cause other useful data already in the cache to spill, potentially. I imagine, also, that small arrays of small vectors could benefit by 64-byte alignment (the array, not the individual vectors) but I'm not sure how quickly the prefetcher picks up on the array and kicks in -- this potential optimization would only help quite small arrays of vectors (I'd guess < 8 vectors for certain, < 16 probably) -- though it'll never hurt, AFAICT.

throw table_exception("(? ???)? ? ???");

This topic is closed to new replies.

Advertisement