Matrix 16 byte alignment

Graphics and GPU Programming Programming

Started by Noxil October 21, 2016 11:01 PM

12 comments, last by Ravyne 7 years, 5 months ago

SeanMiddleditch

17,596

October 24, 2016 05:32 PM

These claims are both misleading.

Dropping support of x86 in favor to x64 might solve a lot of headaches

x64 aligns data by 16 bytes, x86 - by 8.

The default alignment is entirely implementation-defined. There is no guarantee that allocations are 16-byte aligned just because you're on the x64 architecture. There are absolutely platforms that use 8-byte new/malloc alignment even when compiled to x86_64.

OSX/iOS - guaranteed 16-byte alignment no matter which architecture is targeted.
Microsoft - guaranteed 16-byte alignment on x64, 8-byte alignment on x32.
Linux/Android - guaranteed 8-byte alignment only.
Other platforms - not sure off the top of my head.

Sean Middleditch – Game Systems Engineer – Join my team!

Mona2000

1,967

October 24, 2016 06:56 PM

Linux/Android - guaranteed 8-byte alignment only.

The GNU libc guarantees 16-byte alignment on x64 (http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html).

SeanMiddleditch

17,596

October 24, 2016 07:21 PM

Linux/Android - guaranteed 8-byte alignment only.

The GNU libc guarantees 16-byte alignment on x64 (http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html).

I stand corrected. That hadn't been the case in the past.

I'd still be wary of relying on the behavior, though.

It's probably worth noting that just installing a custom allocator also solves the problem. Most serious games engines I've used will drop in a custom allocator that makes alignment guarantees, either with a flat 16-byte alignment guarantee or a 16-byte alignment for blocks at lesst 16 bytes in size guarantee.

Sean Middleditch – Game Systems Engineer – Join my team!

Ravyne

14,306

October 24, 2016 08:01 PM

Another potential performance impact of unaligned vectors and matrices is that your data can cross a cache-line boundary, increasing cache spills and potentially wasting precious memory bandwidth. A 4x4 single-precision matrix fills a cacheline exactly on most current architectures, so you might consider aligning static/long-lived matrices on 64-byte boundaries even. For 4-wide single-precision vectors, aligning on 16-byte addresses relieves the potential to cross cache-line boundaries which, in the worst-case scenario, can cause your program to read 128 bytes of data to use only 16 bytes of it (though, you probably shouldn't be operating on single small vectors anyways); it could also cause other useful data already in the cache to spill, potentially. I imagine, also, that small arrays of small vectors could benefit by 64-byte alignment (the array, not the individual vectors) but I'm not sure how quickly the prefetcher picks up on the array and kicks in -- this potential optimization would only help quite small arrays of vectors (I'd guess < 8 vectors for certain, < 16 probably) -- though it'll never hurt, AFAICT.

throw table_exception("(? ???)? ? ???");

Matrix 16 byte alignment

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Matrix 16 byte alignment

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines