Back to General and Gameplay Programming

C++ Compiler Optimization

General and Gameplay Programming Programming

Started by ill November 17, 2011 02:29 PM

14 comments, last by Slavik81 12 years, 5 months ago

ill

324

Author

November 17, 2011 02:29 PM

So I know code like this:

float denominator = 20 / 50 * 91 + 3;

float numeratorThing1 = 70 * 90;



float val1 = (56 * 90 + numeratorThing1) / denominator;

float val2 = (57 * 90 + numeratorThing1) / denominator;

float val3 = (58 * 90 + numeratorThing1) / denominator;

Is just as optimal as code like this:

float val1 = (56 * 90 + 70 * 90) / (20 / 50 * 91 + 3);

float val2 = (57 * 90 + 70 * 90) / (20 / 50 * 91 + 3);

float val3 = (58 * 90 + 70 * 90) / (20 / 50 * 91 + 3);

But does the compiler know to optimize out function calls that would return the same value each time?

for(vector<Widgets>::iterator iter = m_widgets.begin(); iter != m_widgets.end(); iter++) {

   float val1 = (56 * 90 + iter->getWidth()) / getApplicationContext()->getScreenDimensions().getViewAspect();

   float val2 = (57 * 90 + iter->getWidth()) / getApplicationContext()->getScreenDimensions().getViewAspect();

   float val3 = (58 * 90 + iter->getWidth()) / getApplicationContext()->getScreenDimensions().getViewAspect();

}

Into something that would be more optimal like this?

vector<Widgets>::iterator widgetsEnd = m_widgets.end();



for(vector<Widgets>::iterator iter = m_widgets.begin(); iter != widgetsEnd ; iter++) {

   float width = iter->getWidth();

   float viewAspect = getApplicationContext()->getScreenDimensions().getViewAspect();



   float val1 = (56 * 90 + width) / viewAspect;

   float val2 = (57 * 90 + width) / viewAspect;

   float val3 = (58 * 90 + width) / viewAspect;

}

SiCrane

11,840

November 17, 2011 02:59 PM

But does the compiler know to optimize out function calls that would return the same value each time?

In general, no. If the functions are inline then it might.

rip-off

11,000

November 17, 2011 03:15 PM

I'd write the latter code anyway, because it is clearer to me. I'd probably even have put the view aspect outside the loop too, given that it is independent of the widgets.

Microsoft's Link Time Code Generation can inline functions across modules, and apparently GCC can too.

The real answer is to enable the appropriate optimisations in Release mode and examine the resulting disassembly. Don't forget to profile before and after, and ensure you're having a positive impact on performance. Of course, best would be to profile the whole program first, and concentrate your efforts on your actual bottlenecks, rather than chasing potential problems.

blog | twitter

alvaro

21,604

November 17, 2011 03:17 PM

If you use gcc, you can use the attribute `pure' to tell the compiler that the function doesn't have side effects and only depends on its arguments and global variables. This would allow for such an optimization to happen.

However, I agree with rip-off that writing the code with the intermediate variable is preferable because it's more readable, particularly if you give the intermediate variable an informative name.

Antheus

2,410

November 17, 2011 05:53 PM

Most of such optimizations are not valid due to aliasing rules. While rare in practice, they do occur and compilers prefer conservative route to avoid causing obscure bugs.

Accessing members of a struct should not be inlined, even though it often is. Floating point calculations are also dependent on hardware, so compiler might choose to not optimize them at compile time. Some compilers allow extra switches to force such optimizations.

If performance matters, then guessing what compiler might or might do isn't productive or useful. Either profile and examine assembly, or write code that isn't susceptible to such issues.

// avoid reading m_widgets (requires this dereference)

// avoid using accessor, vector is linear

Widget * curr = &m_widgets.front();

Widget * end = curr + m_widgets.size();



// multiply is always preferred to division

float viewAspectInverse = 1.0 / getApplicationContext()->getScreenDimensions().getViewAspect();





for ( ; curr < end; curr++) {  // save one register by using start and end pointers

  float width = curr->width;  // no getter, plain variable access

  float val1 = (56 * 90 + width) * viewAspectInverse;

  float val2 = (57 * 90 + width) * viewAspectInverse;

  float val3 = (58 * 90 + width) * viewAspectInverse;

}

Of course, the code above is NOP, hopefully the real one does something more useful.

The loss of "safe" programming here is a trade-off. Writing generic <algorithm>-like code has its advantages, but carries a lot of hidden penalties. So if performance does matter, it makes sense to choose more suitable approach.

Once code is converted to use plain arrays, then one can go one step further and mark variables with __restrict (technically a C not C++ feature). But if using raw pointers and PODs, code might as well be compiled as plain C.

void foo(WidgetPOD * p, size_t n);



void foo_cpp(vector<Widget> & w) {

  foo(&w.first(), w.size());

}Or something similar. std::for_each would also work.



And by being explicit about what code does, compiler also has more optimization opportunities.



As far as basics go: 

- avoid any kind of 'this' dereferencing, or evaluate it once. If accessing class members, there can be considerable hidden cost

- avoid dereferences in general, not because pointer chasing would necessarily be slow, but because compilers need to be conservative about what they access

- prefer PODs or at least expose raw members 

- despite compilers being capable of doing it, perform invariant hoisting yourself - mostly to avoid hidden penalties mentioned above



Another way to give compiler explicit instructions is through template meta programming, but that code gets ugly. I did however manage to get compiler to reliable emit more efficient integer operations, including more compile-time evaluation, but with some really gnarly code, not suitable for human consumption.

ill

324

Author

November 17, 2011 06:53 PM

Hmm yeah. I've been doing a lot of coding lately where I think, nah whatever, compiler will optimize that. Especially constantly dereferencing STL iterators in for loops.

I might want to go back and rewrite some of my code heh. It runs fast now but then again I'm running it on a fast computer. I might even get an FPS boost in my game. I'm definitely going to profile it a bit first to make sure this is worth my time.

One thing about this code though is it puts curr and end outside of the scope of the for loop which is a bit annoying.
And I can have element pointers for Widget in this loop since it's a vector, but that won't work on things like map iterators or list iterators. I use those sometimes if vector isn't good enough.

Also a lot of times I have inline functions for things like getWidth so I think those get optimized, right? There's no need to worry about non direct member access?

// avoid reading m_widgets (requires this dereference)

// avoid using accessor, vector is linear

Widget * curr = &m_widgets.front();

Widget * end = curr + m_widgets.size();



// multiply is always preferred to division

float viewAspectInverse = 1.0 / getApplicationContext()->getScreenDimensions().getViewAspect();





for ( ; curr < end; curr++) {  // save one register by using start and end pointers

  float width = curr->width;  // no getter, plain variable access

  float val1 = (56 * 90 + width) * viewAspectInverse;

  float val2 = (57 * 90 + width) * viewAspectInverse;

  float val3 = (58 * 90 + width) * viewAspectInverse;

}

BradBobak_68669

137

November 17, 2011 07:35 PM

I think you could just put curr, end in the for statement..



for (Widget *curr = &m_widgets.front(), *end = curr + m_widgets.size(); curr < end; ++curr)

{



}

Bregma

9,461

November 17, 2011 09:26 PM

The answer is: tune where your optimizer tells you you have a problem spot and focus on clarity everywhere else.

It's almost never worth worrying about microperformance, and when it is you'll definitely know it (it has dollar signs attached).

Stephen M. Webb
Professional Free Software Developer

Hodgman

52,717

November 17, 2011 10:21 PM

Most of such optimizations are not valid due to aliasing rules. While rare in practice, they do occur...

I'd argue that aliasing is extremely common in almost every C++ program. For the unaware: any time you write a function that uses multiple pointers, you've potentially added aliasing. N.B. [font="Courier New"]this[/font] is a pointer like any other.

void Widget::SumTotal( int& outputTotal ) {

  outputTotal = 0;

  for( int i=0; i<m_count; ++i )//must read m_count every iteration...

    outputTotal += m_data;//...because...

}



Widget w; w.SumTotal( w.m_count );//...this is a valid use of that function



void Widget::SumTotal( int& outputTotal ) {//but did you actually mean to write this:

  int localTotal = 0;

  int* data = m_data;

  for( int i=0, end=m_count; i<end; ++i )

    localTotal += data;

  outputTotal = localTotal;

}

[edit]N.B. the "local variable" rewrite works just as well as using the [font="Lucida Console"]restrict[/font] keyword (or [font="Courier New"]__restrict__[/font] / [font="Courier New"]__declspec(restrict)[/font]) here, by making the number of pointer reads/writes explicit. I'd recommend solving these problems with sensible use of local variables first (read data into locals early, write from locals to memory late), and only resorting to [font="Lucida Console"]restrict[/font] when necessary.

. 22 Racing Series .

alvaro

21,604

November 17, 2011 10:30 PM

[quote name='Antheus' timestamp='1321552432' post='4885032']Most of such optimizations are not valid due to aliasing rules. While rare in practice, they do occur...

I'd argue that aliasing is extremely common in almost every C++ program. Any time you write a function that uses multiple pointers, you've potentially added aliasing. N.B. [font="Courier New"]this[/font] is a pointer like any other.[/quote]

And that's why we need the `restrict' keyword, like in C99.

C++ Compiler Optimization

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

C++ Compiler Optimization

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines