Hello
I've been wondering for some time, when compiling a HLSL shader, and inspecting the assembly code, the vector - matrix multiplications can either result in four dp4 or four mad instructions depending on which side I am multiplying from.
My question is, is there a difference in performance, should I worry about it? At the moment, all of my shaders in my engine are using dp4 instruction for that (because I started it from some tutorial that used it that way...). I know that on the final hardware specific code after CreateXSShader, that is not documented, but are there any guidelines to follow?