Jump to content
  • Advertisement
Sign in to follow this  
turanszkij

Shader matrix mul - dp4 or mad?

This topic is 563 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello

I've been wondering for some time, when compiling a HLSL shader, and inspecting the assembly code, the vector - matrix multiplications can either result in four dp4 or four mad instructions depending on which side I am multiplying from.

My question is, is there a difference in performance, should I worry about it? At the moment, all of my shaders in my engine are using dp4 instruction for that (because I started it from some tutorial that used it that way...). I know that on the final hardware specific code after CreateXSShader, that is not documented, but are there any guidelines to follow?

 

Share this post


Link to post
Share on other sites
Advertisement
You can also influence it with your choice of "column_major float4x4" or "row_major float4x4" matrices in HLSL.

10 years ago it would've made a difference as GPUs worked on float4 types at the hardware level. These days GPUs operate on individual floats so it actually works out the same either way.

Share this post


Link to post
Share on other sites

I remember back in the old days, that we were encouraged (by the hardware vendors) to choose the dp4 version, as the order of the instructions doesn't change the outcome. Back then the compiler(s) had a tendency to shuffle the instructions around, and the order of the madd instructions could result in numerical different results which again could result in z fighting when using a multipass approach.

I don't know how good (or bad) the compiler is nowadays in maintaining the order, as I'm still using the dp4 approach ;-)

Share this post


Link to post
Share on other sites

I think AMD 7xxx (+ PS4 and XBONE) was the first when they switched to use scalars for everything (scalar means processing a float4 not as large type but just 4 floats in order), don't know about NV.

What is really important is that when it makes sense we load big things like matrices to scalar registers for AMD (scalar here means one register for all threads versus VGPRs which are unique for each thread so rare). See there: http://gpuopen.com/optimizing-gpu-occupancy-resource-usage-large-thread-groups/

I did not know that this only works for some buffer types with D3D, so if anyone knows how this applies to Vulkan let me know :)

Share this post


Link to post
Share on other sites

NVidia's first SIMT architecture (as opposed to the older SIMD/vector style) was the Geforce 8 series (if I remember correctly).

By SIMT you basically mean unified shaders right?  Going by what you linked that still appears to be a vector architecture.  I looked at the linked (from your link) sucessor fermi and that appears to be scalar.

Share this post


Link to post
Share on other sites

NVidia's first SIMT architecture (as opposed to the older SIMD/vector style) was the Geforce 8 series (if I remember correctly).

By SIMT you basically mean unified shaders right?  Going by what you linked that still appears to be a vector architecture.  I looked at the linked (from your link) sucessor fermi and that appears to be scalar.

Nah unified shaders means they no longer have separate hardware for vertex shading vs pixel shading. The GeForce 7 had this split, so you could have pixel-shading cores sitting idle while the vertex-shading cores were maxed out... :(
GeForce7 pixel shaders used SIMD instructions (like SSE) - so each pixel could operate on a float4 per clock. GeForce 8 (Tesla) ran 8 pixels per "core" using scalar instructions (float) and Fermi bumped up to 32 pixels per "core", still using scalar instructions per pixel.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!