GCC -malign-double

Started by
7 comments, last by Alundra 9 years ago

Hi,

I have a project which uses PhysX and I must have the param -malign-double on gcc to have the compilation possible.

Is it bad ? If it's bad, other solution ?

Thanks

Advertisement

Hello,

I'm not sure why you need that setting with physics, but let me explain what it does:

When you allocate memory it is always on some virtual address (which is located on some physical memory page ~ when you're working with it; it can be swap-in and swap-out between physical memory and hard drive) ... these virtual pages tend to have size of 4KiB (because 4KiB is base physical page size on x86), or basically a multiply of 4KiB - that is just fyi).

Now, on each modern CPU you have so called FPU & SIMD processor. It is Floating Point Unit & Single-Instruction-Multiple-Data ~ you have multiple values in single register and perform single operation over all of them (4x float in SIMD SSE is well know ~ 4x float ~ 4D vector).

Let's continue (I will describe just details for SIMD as I don't remember FPU specifications by heart), when you're reading data from memory to register, there is one instruction that allows you to quickly load data from memory into this SIMD register; and one to save - they are 0F 28 and 0F 29 in assembly written as MOVAPS. These instructions perform fast load of 16 bytes from memory address (or another register) or fast save, there is just one condition - the memory address must be aligned on 16-bytes boundary (physically!).

This can become a bit problematic when one use virtual memory, although there is one nice property we have - each virtual address of 0x0 definitely begins at 16-byte boundary (because it has to be assigned with physical page ~ which always begins at 16-byte boundary); and it always takes whole such page. E.g. when virtual address X modulo 16 is 0, the physical address during the computation will definitely also be modulo 0 equal to 0 (the modulo operation here means that given address is 16-byte aligned).

Now, what you, as a programmer need to know - all the allocations & delocations (both stack, and heap based) must be performed on 16-byte boundaries ~ e.g. each such address must be equal to 0 after 'mod 16' operation. There are OS-specific functions to handle the heap allocation correctly (_aligned_malloc under Windows OS, posix_memalign under POSIX-based OS, etc.); stack based allocations must be hand-specified to the compiler (using __declspec(align(16)) under MSVC or __attribute__((aligned(16))) under GCC).

Now the previous also applies for doubles & long long (although they are on 8-byte boundary, not 16-byte ... fyi, there are also 32-byte registers and on some CPU architectures even larger); The mentioned compiler directive -malign-double forces all doubles and long long to be aligned at 8-byte boundary (as they will actually use aligned alternatives of load/store instructions resulting in better performace).

Nothing is free though - in case you have structure where there is 8-byte double (aligned on 8-byte boundary) and 1 byte - you have to add unused 7 bytes as pad (e.g. in general your memory usage can be increased).

My apologize if I wen't a bit too much into hardware ~ but I wanted to share info about concepts why it is like this.

EDIT: So in general it isn't that bad (it can actually be good and yield better performance), yet there might be some troubles (and crashes) when using memory alignment concept without further knowledge behind it.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

As I understand it this flag doesn't really do much on most x86_64 ABI's since long longs and doubles are already aligned on a two-word boundary. However if that is not the case then you must be very careful what you apply that flag to, because if any non-PhysX structure declarations (in library/system headers and so on) get affected by that flag and their layout suddenly differs from the layout expected by their implementation, which does not expect malign-double, there's no telling what might happen, but it will almost certainly end with stack corruption, an eventual segmentation fault of your program, and lots of tears.

I would recommend controlling the usage of this compiler flag as strictly as possible, in other words limiting it to the PhysX headers (or wherever they are needed) and nothing else.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

As Bacterius alluded to..this flag can turn out to be very viral. I had the same issue with PhysX too and it wasn't until after using 3.3.1. My guess is either they screwed up the build or they changed the way the libraries are being built.

Seconding -- or thirding, I guess -- the earlier comments.

Setting the flag globally is likely to cause issues unless you are compiling all libraries from scratch and all libraries are written to cope with the unexpected packing requirements.

Far better to just set alignment flags on the specific structures that need it.

The problem is the error is in a physx file called PxPreprocessor.h not only on the compilation of PhysX.

The only way is to enable a flag before include and disable after include ?

How do that correctly if it's the only way ?

They make the source code available.

Add it to your source tree, add the source to your build chain. If you need to modify something that doesn't exactly meet your needs -- such as that header file -- then modify the file and comment the reason for the change.

As Bacterius alluded to..this flag can turn out to be very viral.

You could say that it is a malign flag.

Here the part of PhysX code which gives the error :


// Ensure that the application hasn't tweaked the pack value to less than 8, which would break
// matching between the API headers and the binaries
// This assert works on win32/win64/360/ps3, but may need further specialization on other platforms.
// Some GCC compilers need the compiler flag -malign-double to be set.
// Apparently the apple-clang-llvm compiler doesn't support malign-double.

struct PxPackValidation { char _; long long a; };

#if !defined(PX_APPLE)
PX_COMPILE_TIME_ASSERT(PX_OFFSET_OF(PxPackValidation, a) == 8);
#endif

PX_OFFSET_OF is defined like that before :


#if defined(PX_GNUC)
#define PX_OFFSET_OF(X, Y) __builtin_offsetof(X, Y)
#else
#define PX_OFFSET_OF(X, Y) offsetof(X, Y)
#endif

PX_COMPILE_TIME_ASSERT :


// static assert
#if defined(__GNUC__) && ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 7)))  
#define PX_COMPILE_TIME_ASSERT(exp) typedef char PxCompileTimeAssert_Dummy[(exp) ? 1 : -1] __attribute__((unused))
#else 
#define PX_COMPILE_TIME_ASSERT(exp) typedef char PxCompileTimeAssert_Dummy[(exp) ? 1 : -1]
#endif

This topic is closed to new replies.

Advertisement