Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 13 Apr 2009
Offline Last Active Oct 29 2012 04:01 AM

Posts I've Made

In Topic: C++ multiple files and low-level effects

27 July 2012 - 08:52 PM

Multiple OBJ files foils inlining, unless using link time code generation (/LTCG).

If you turn off function level linking, the whole obj gets linked in if ANY function or data in the OBJ is referenced in the rest of the program. This can be an issue if you care about space (as we definitely do on consoles).

Multiple OBJs can be a problem if a header file defines static variables. This will cause the variable to exist 'without a name' in each of the obj files and get linked into the final binary multiple times (another space issue, and can be hard to track down the cause)

In Topic: Can't have function prototype with 4 __m128 parameters?

27 July 2012 - 08:43 PM

When compiling for 32 bits you need to pass SIMD data by reference or pointer. 64 bit ABI allows them to be passed by value at a language level, but if you compile on both targets you need to do it the 32 bit way.

The simple reason is 32 bit ABI does not correctly align the stack to 16 bytes. You may ask, what about local variables then?. Functions that have __m128 variables as local variables cause the compiler to generate additional code to align the stack so they can be stored there.

Note that even in x64, __m128 variables are not passed via xmm registers. They will be written to the stack and passed by reference behind the scenes. However your code will compile when you write it to pass by value. scalar floats and doubles (i.e. stuff not using __m128 as their data type) DO get passed in xmm registers, but the ABI does not handle SIMD data types. Weird I know but thats the way it is speced at the moment. The way of dealing with this problem is to forceinline all the code passing by value, but that has some rather practical limits.

In Topic: Specular Power = 0

12 July 2012 - 06:51 PM

zero bases break pow as well on GPUs since the pow is basically:

float pow(float base, float exponent)
return exp2(log2(v), exponent);

The fix is to either call max(v, very_small_number ) or something like

float mypow(float base, float exponent)
return base > very_small_number ? exp2(log2(v), exponent) : 0;

And you get to sit down and tune your own very_small_number for whatever you are working on.

In Topic: Dynamic branching in shader not working. Keeps jumping out.

12 July 2012 - 06:47 PM

I'm dying to really see how well modern hardware handles true branches when you can disregard old hardware, because we work extra hard to flatten our shaders and build custom permutations for all of them in order to minimuze shader ALU processing.

In Topic: It seems that nobody here is smart enough, myself included...

12 July 2012 - 06:28 PM

The games I am working on this year are probably the last two big SM3 projects we will be doing. its all D3D11 with SM5 and SM4 profiles going forward . . .