# Breaking the aliasing rule in C++

This topic is 2631 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello everyone

I'm wrinting a bunch of small functions like max, min, NextPowerOfTwo, etc in C++ that are branchless. I took alot of stuff here. Most are made using integer arithmetic. Then i came to think of a sign function for floating point numbers. All i really want to do is to get the sign bit after all. So i first wrote this :

 static inline bool IsNegative(float f) { return *reinterpret_cast<int*>(&f) & 0x80000000; } 

However, this breaks the stric aliasing rule for C++ (and isn't even legal). I had another idea, using a union like the following function

 static inline bool IsNegative(float f) { union { float f; int i; } u; u.f = f; return u.i & 0x80000000; } 

Would that be better ?

##### Share on other sites
It is still non-standard. However GCC (and I believe, other compilers follow the same trend, GCC is the only one AFAIK that made it explicit in their documentation) will be aware of your aliasing break and will behave properly.

Strictly speaking, it's still non-standard. Furthermore watch out for x64 portability and endianess.

BTW I got hypnotized by your avatar.

##### Share on other sites
Is this some (misguided) attempt at performance? If so, you really need to understand the architecture you're targeting. On modern consoles (for example), the LHS (load-hit-store) penalty that you're creating by moving from float -> int is almost certainly many times more expensive than the branch that you're avoiding. fsel would be the "proper" micro-optimization (if such a thing existed) here.

##### Share on other sites

[quote name='Laval B' timestamp='1317481197' post='4867987']
All i really want to do is to get the sign bit after all.

Then do this:static bool IsNegative(float f) { return f < 0.0f; }
Since it's a common operation, compilers will optimize it.

Inline is also not recommended. Compilers will inline on their own and if they don't, then one must target specific functions that show up as bottleneck.
-----------

Now for the important part.

Bithacks are used to avoid branches. Many algorithms can be rewritten in branchless manner to target certain chips.

Having a method "IsNegative" means a branch. Look at how it's used:if (isNegative(f)) { } else { }Whether it's implemented using a comparison or mask doesn't change that.

A branchless algorithm would look differently:f &= SIGN_MASK; // bit 31 (or 32) now contains sign f >>= 31; // shift sign to bit 0, f will be either 0 of 1 f *= 0xffffffff; // fill in the mask, result will be either all 0 or all 1 bits, multiply 0xffff... by 0 or by 1 res = (x & f) | (y & ~f); 

A contrived example (not entirely valid due to types), but equivalent to code: if (isNegative(f)) { // x,y or vice versa, depending on how the sign evaluates, cannot remember right now f = y; } else { f = x; }

So in branchless code, isNegative will never appear. At most there will be a function called something like: signMask that returns some suitable type, such as unsigned int or perhaps even an array of chars or something that fits perfectly into register.

On desktop PCs, branchless code typically offers no benefits and may result in poorer performance. On GPUs, branches are effectively forbidden, so if() will appear rarely.
[/quote]

Thank you very much for the detailed answer.

##### Share on other sites

It is still non-standard. However GCC [...] will be aware of your aliasing break and will behave properly.

Careful, "behave properly" can mean "take the leeway offered by the standard, potentially leading to optimizations that break code with aliasing violations".

For example, MinGW g++ 4.5 with flag combinations such as "-O2 -march=pentium4 -msse2 -ftree-vectorize" will warn you whenever it can deduce an aliasing violations.

The aliasing rules are there to provide optimization opportunities to the compiler. Those opportunities can be taken regardless of whether or not our code happens to break the aliasing rules. So we had better not break those rules

##### Share on other sites
As far as I can remember, when it comes to the union alternative, GCC avoids the optimization on aliasing violations. It's non-standard though, so one shouldn't expect it to be optimized correctly in other compilers, or even in future iterations of gcc.

See here ("Casting does not work as expected when optimization is turned on")

BTW, Antheus is spot on. To add to what he said, due to it's simplicity it might be possible that the compiler converts it into a conditional move instruction (cmov). In theory was said to be faster than a branch, but practice says it's faster, slower, and the same. Also highly depends on the CPU architecture.

##### Share on other sites

Having a method "IsNegative" means a branch. Look at how it's used:...
A branchless algorithm would look differently:
As another example, here's how a real-world use-case of how I use a branchless isPositive, which returns [font="Courier New"]1.0f[/font] for positive values, and [font="Courier New"]0.0f[/font] for negative values. Representing logical results as 1.0/0.0 allows the logical decisions to be done branchlessly using arithmetic instead.
float4 uv01 = float4(uv0.xy, uv1.xy); float4 uv23 = float4(uv2.xy, uv3.xy); float4 valid01 = isPositive(uv01) * isPositive(1.0f-uv01); float4 valid23 = isPositive(uv23) * isPositive(1.0f-uv23); valid01 *= valid01.yxwz; valid23 *= valid23.yxwz; float4 results0123 = hit0123 / (1.0f + d0123*d0123*falloff.xxxx); float1 total = dot( float4(valid01.xz,valid23.xz), results0123 );...but seeing this kind of programming is extremely CPU/compiler dependent, it's really not a big deal if you step outside the bounds of the standard and rely on compiler-specific details.

##### Share on other sites
As another example, here's how a real-world use-case of how I use a branchless isPositive, which returns 1.0f for positive values, and 0.0f for negative values. Representing logical results as 1.0/0.0 allows the logical decisions to be done branchlessly using arithmetic instead.[/quote]

I don't recommend doing this at all. Floating point numbers are stored in special floating point registers which do not support bitwise operations. The compiler will be forced to copy the value out to memory, reload it into an integer register, perform the bit operation, then store the value in memory and reload into the floating point register. This will be a huge performance hit.

You should be doing this instead: (x > 0) * (y > 0)

If you're compiling with SSE instructions, the compiler might be able to eliminate branches for you. (Although VS2010 won't)
If you use the SSE intrinsics manually, you can vectorise your operation and get a performance boost. But using intrinsics properly is quite difficult and it's very easy to end up with a performance hit due to improper use.

http://msdn.microsof...y/11dy102s.aspx

Edit: Thought I was addressing the OP. Branchless operations using bit shifts perform very poorly on my machine, but I imagine it's a different story on platforms that don't incur a huge overhead when trying to juggle floating point and integer registers.

##### Share on other sites
I don't recommend doing this at all. Floating point numbers are stored in special floating point registers which do not support bitwise operations. The compiler will be forced to copy the value out to memory, reload it into an integer register, perform the bit operation, then store the value in memory and reload into the floating point register. This will be a huge performance hit.
That's right for 360/PS3 CPU's, but not every architecture requires bouncing values off of memory to move between different types of registers.
N.B. I didn't show how 'isPositive' is implemented ;)

Yeah, a compare against zero is probably the best default here (and hopefully your compiler can translate it into any faster options available). On different chips there's entirely floating-point instructions that the compiler could use for this, like fsel, step, sign_sat, etc...

But generally, if you're writing low-level 'performance' code like this, you should probably know exactly which CPU architecture and which compiler you're targeting.
On one platform I tested, the compiler actually performed the best with the following code:
[font="Courier New"](x > 0 ? 1.0f : 0.0f) * (y > 0 ? 1.0f : 0.0f)[/font]

1. 1
2. 2
Rutin
20
3. 3
khawk
17
4. 4
A4L
14
5. 5

• 12
• 16
• 26
• 10
• 44
• ### Forum Statistics

• Total Topics
633759
• Total Posts
3013715
×