Code Optimization in visual studio

Started by
30 comments, last by Cornstalks 14 years, 10 months ago
given the following code:

void Quat_To_Matrix1(Quaternion& quat,Matrix& matrix)
{
  const float x = quat[0];
  const float y = quat[1];
  const float z = quat[2];
  const float w = quat[3];

  matrix[0][0] = w*w + x*x - y*y - z*z;
  matrix[0][1] = 2*x*y + 2*w*z;
  matrix[0][2] = 2*x*z - 2*w*y;
  matrix[0][3] = 0.0f;

  matrix[1][0] = 2*x*y-2*w*z;
  matrix[1][1] = w*w - x*x + y*y - z*z;
  matrix[1][2] = 2*y*z + 2*w*x;
  matrix[1][3] = 0.0f;

  matrix[2][0] = 2*x*z + 2*w*y;
  matrix[2][1] = 2*y*z - 2*w*x;
  matrix[2][2] = w*w - x*x - y*y + z*z;
  matrix[2][3] = 0.0f;

  matrix[3][0] = 0.0f;
  matrix[3][1] = 0.0f;
  matrix[3][2] = 0.0f;
  matrix[3][3] = w*w + x*x + y*y + z*z;
}


void Quat_To_Matrix2(Quaternion& quat,Matrix& matrix)
{
  const float x = quat[0];
  const float y = quat[1];
  const float z = quat[2];
  const float w = quat[3];

  float _w = w*w;
  float _x = x*x;
  float _y = y*y;
  float _z = z*z;
  
  matrix[0][0] = _w + _x - _y - _z;
  matrix[0][1] = 2*x*y + 2*w*z;
  matrix[0][2] = 2*x*z - 2*w*y;
  matrix[0][3] = 0.0f;

  matrix[1][0] = 2*x*y-2*w*z;
  matrix[1][1] = _w - _x + _y - _z;
  matrix[1][2] = 2*y*z + 2*w*x;
  matrix[1][3] = 0.0f;

  matrix[2][0] = 2*x*z + 2*w*y;
  matrix[2][1] = 2*y*z - 2*w*x;
  matrix[2][2] = _w - _x - _y + _z;
  matrix[2][3] = 0.0f;

  matrix[3][0] = 0.0f;
  matrix[3][1] = 0.0f;
  matrix[3][2] = 0.0f;
  matrix[3][3] = _w + _x + _y + _z;
}
Suppose i am using Visual Studio 2008, is there any need to the optimization in the second function ? i mean do visual studio reaches an assembly in function one that is optimized without making the manual optimization in the second function ? [Edited by - ApochPiQ on June 3, 2009 8:56:02 AM]
Advertisement
Generally it's better to ask the compiler if it does an optimization rather than a human being. You can use the /FA family of switches to get MSVC to output the assembly it generates for a given C++ file.
Measure it with a profiler. If your program does not spend the majority of its time in this function, the answer to "do I need to optimise it" is unequivocally no.
Construct (Free open-source game creator)
Quote:Original post by SiCrane
Generally it's better to ask the compiler if it does an optimization rather than a human being. You can use the /FA family of switches to get MSVC to output the assembly it generates for a given C++ file.



i am asking this question to know the level of optimization we have to do manually
and what are the issues modern c++ compilers take care of.


the Code Generator of Maple generates function two, while i saw function one
in very large free source projects that are realy reliable.

please show some respect in your answers, otherwise it is useless in this forum.
Quote:Original post by eGamer
please show some respect in your answers, otherwise it is useless in this forum.


I was really about giving you some links about modern optimization. But then I read that phrase where you request respect manually.

...

On the other hand, let me give you a very helpful link anyways: click. And as it turns out, there appear all the links I wanted to recommend.


edit: Out of curiosity, where exactly was someone disrespectful? I fail to find that.
Quote:Original post by eGamer
please show some respect in your answers, otherwise it is useless in this forum.


I have no idea how it is that you sense any disrespect here.
Quote:Original post by eGamer

i mean do visual studio reaches an assembly in function one that is optimized without making the manual optimization in the second function ?


Both functions are completely unoptimized.

You first need to rewrite the code as SIMD (SSE - SSE4, depending on target platform) in assembly, then you need to run a decent profiler, to determine if instruction scheduling works as expected, or whether there are any other issues. Of course, profile against non-SIMD version, it might be faster.

In addition, the data needs to be properly aligned and prefetched in cache, and the loop calling it must be either cache oblivious or take into consideration cache line sizes.

Lastly, you need to profile the whole application to determine if bottlenecks occur elsewhere, whether inlining the function is possible or beneficial, and whether the rest of application doesn't introduce unexpected bottlenecks.
Your manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.

The opposite is rematerialization; sometimes you don't want to do too much CSE if you don't have enough registers to store all the CSEs and don't want to spill to memory, so you recalculate expressions because it's cheaper than keeping them around.
Quote:Original post by outRider
Your manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.
Be careful with such assumptions.
C compilers often refuse to do any kind of algebraic transformation on floating point expressions, so I don't expect it to work for things that aren't exactly identical. That is it may well fail to expose certain possible common sub-expressions within the sums (e.g. things like x - y and 2 + x - y) .

The usual aliasing issues apply here too of course. Caching the quaternion members (as the OP is doing here) is a good idea for reasons other than saving on typing.
Quote:Original post by implicit
Quote:Original post by outRider
Your manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.
Be careful with such assumptions.
C compilers often refuse to do any kind of algebraic transformation on floating point expressions, so I don't expect it to work for things that aren't exactly identical. That is it may well fail to expose certain possible common sub-expressions within the sums (e.g. things like x - y and 2 + x - y) .


Hence the "usually" and "at least consider them" qualifiers. I'm aware of the FP complications. Most compilers I've used have a switch for fast vs precise FP consistency and some use fast at high opt.

Quote:Original post by implicit
The usual aliasing issues apply here too of course. Caching the quaternion members (as the OP is doing here) is a good idea for reasons other than saving on typing.


Aliasing is irrelevant to the OPs question, the difference between the two snippets is the explicit calculation of the squared sub-expressions, all the variables are pinned. But yes, in general aliasing is a factor in correctly identifying CSEs and invariants.

This topic is closed to new replies.

Advertisement