Jump to content

  • Log In with Google      Sign In   
  • Create Account


#ActualHodgman

Posted 21 April 2013 - 06:25 PM

If the Vec4 isn't in the cache before this function, then it doesnt matter what code is in there, as the memory accesses will be the bottleneck ;)

As above, writing ASM is really bad for the optimizer these days, because it doesn't understand ASM so becomes very defensive. Intrinsics are much preferred.

For really optimal SSE code, you'd have the function return a vec4, with the result in the 'x' component, to avoid the m128<->float conversions everywhere and allow all your different SSE'd math functions to be inlined together well. I've seen some engines use a special type for this case, like float_in_vec, etc...

#2Hodgman

Posted 21 April 2013 - 06:23 PM

If the Vec4 isn't in the cache before this function, then it doesnt matter what code is in there, as the memory accesses will be the bottleneck ;)

As above, writing ASM is really bad for the optimizer these days, because it doesn't understand ASM so becomes very defensive.

For really optimal SSE code, you'd have the function return a vec4, with the result in the 'x' component, to avoid the m128<->float conversions everywhere and allow all your different SSE'd math functions to be inlined together well. I've seen some engines use a special type for this case, like float_in_vec, etc...

#1Hodgman

Posted 21 April 2013 - 06:22 PM

If the Vec4 isn't in the cache before this function, then it doesnt matter what code is in there, as the memory accesses will be the bottleneck ;)
For really optimal SSE code, you'd have the function return a vec4, with the result in the 'x' component, to avoid the m128<->float conversions everywhere and allow all your different SSE'd math functions to be unlined together well. I've seen some engines use a special type for this case, like float_in_vec, etc...

PARTNERS