Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!


1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Confused by the VTune profiling results


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
3 replies to this topic

#1 GuyWithBeard   Members   -  Reputation: 1018

Like
0Likes
Like

Posted 23 February 2014 - 04:14 AM

Hi,

 

I am running my application through VTune to try to find performance bottlenecks. Most of the things it points out seem logical to me, but a certain scenario looks very weird. I attached a picture showing what I mean:

 

vtune.png

 

I hope the image shows all you need, but please ask for more info if you need it.

 

My question is regarding the highlighted row. How can incrementing a float value accessed through a reference take that long? Is there something fundamentally stupid about the code above, performance-wise, that would cause the highlighted row to take that long? And if so, what should I do to make it faster? Or, are the profiling results perhaps somehow incorrect?

 

The build I am running is a release build with "normal" VS settings and debug symbols. Ie. I have not adjusted anything in particular that would come to mind.

 

Any help you can give me is greatly appreciated. Cheers!



Sponsor:

#2 Erik Rufelt   Crossbones+   -  Reputation: 4223

Like
1Likes
Like

Posted 23 February 2014 - 05:38 AM

The optimizer inlines, reorders and rewrites the code, so a line of C++ doesn't really have meaning anymore as the machine-code looks entirely different. The profiler tries to go backwards to find the C++ code that is responsible for execution time. It is probable that the generated code has everything reordered and the highlighted line is what causes any work to actually be done, so you can't really get anything more exact than adding 186 + 28 + 28 + 19 for the total loop cost.

You could go to ASM view to see what the assembly code looks like and get more exact readings.



#3 GuyWithBeard   Members   -  Reputation: 1018

Like
0Likes
Like

Posted 23 February 2014 - 05:46 AM

Ah, of course. I guess compiling that one cpp file with #pragma(optimize, off) would make the C++ more readable, but yes I should look at the asm as well. Thanks!



#4 Erik Rufelt   Crossbones+   -  Reputation: 4223

Like
0Likes
Like

Posted 23 February 2014 - 05:52 AM

Probably, but then you wouldn't get very meaningful profiler results as it's the optimized code you want to know the performance bottlenecks for.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS