Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Stream Multiplication Performance with Functions


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
3 replies to this topic

#1 jerrinx   Members   -  Reputation: 208

Like
0Likes
Like

Posted 08 July 2012 - 10:31 PM

Hey guys,

I made a test program to check out performance of stream multiplication using normal, virtual, inline functions, and as a single function.

These are the results when Multiplying 1000,000,000 Floats

Results


[CUtil] ## PROFILE Stream Product normal function : 6.663393837 sec(s) ##
[CUtil] ## PROFILE Stream Product virtual function : 6.608961085 sec(s) ##
[CUtil] ## PROFILE Stream Product inline function : 6.584697760 sec(s) ##
[CUtil] ## PROFILE Stream Product in function : 12.363450801 sec(s) ##

What I don't understand is why Stream Product in a function takes twice as much !!???
Maybe its just my setup or something wrong with the code... I don't know.
Can somebody try this out ?

VS 2010 Express
Release Build

Attached Files


JerrinX

Sponsor:

#2 ApochPiQ   Moderators   -  Reputation: 16077

Like
3Likes
Like

Posted 08 July 2012 - 11:48 PM

This test isn't going to tell you anything useful, at least not as posted. You don't seem to initialize your memory anywhere, so the operations could be doing anything, including arithmetic on denormals, which is far slower than arithmetic on normalized floats. You should set the memory up to contain known values beforehand to ensure you're doing consistent work in all of the tests.

Also, even with that aside, you're not really testing what you think you are here. The compiler is almost certainly smart enough to elide a lot of the excess work you're doing, unroll loops, devirtualize function calls, and so on. You also need to consider far more than just one execution of the work load to account for things like cache warming, branch prediction, and so on. In short, constructing valid artificial benchmarks is extremely complex on modern hardware.

Is there something specific you're trying to find out here?

#3 Digitalfragment   Members   -  Reputation: 869

Like
2Likes
Like

Posted 09 July 2012 - 01:35 AM

If you want to gauge the performance difference, just look at the assembly difference between your different versions - if you see it calling out to a function on the inside of the loop as opposed to running it inline, then you have a perf hit right there.

As ApochPiQ pointed out, the values in memory can have some heavy impact on the performance of your functionality.

But, assembly and data aside, theres also the layout of your memory & whether or not your data is being pre-fetched from the cache in time.

Edited by Digitalfragment, 09 July 2012 - 01:38 AM.


#4 jerrinx   Members   -  Reputation: 208

Like
0Likes
Like

Posted 10 July 2012 - 12:30 AM

Hey Guys,

Thanks for the replies.

I wanted to make a particle system and wanted to test out the performance difference when we call a function to operate on an element vs doing it all together in a single function.

@ApochPiQ
The data set is output[i] = input1[i] * input2[i]
So if in case theres an inconsistency in the multiplication, its going to affect the other test cases also.
I think loops can be unrolled only if you know the data set size before hand. Correct me if I am wrong.

Maybe I need to upcast that class in order to avoid devirtualisation.
Not sure about branch and cache warming though.

When I switched to debug mode, it gave me some sane results. Seems like release mode shuffled the code around a bit.

Debug Mode on 10000000 data set size


[CUtil] ## PROFILE Stream Product normal function : 0.549368595 sec(s) ##
[CUtil] ## PROFILE Stream Product virtual function : 0.582704152 sec(s) ##
[CUtil] ## PROFILE Stream Product inline function : 0.522523487 sec(s) ##
[CUtil] ## PROFILE Stream Product in function : 0.238292751 sec(s) ##

Release with Optimisation turned off on 1000000000 Data Set

[CUtil] ## PROFILE Stream Product normal function : 19.569217771 sec(s) ##
[CUtil] ## PROFILE Stream Product virtual function : 22.762712440 sec(s) ##
[CUtil] ## PROFILE Stream Product inline function : 16.949578101 sec(s) ##
[CUtil] ## PROFILE Stream Product in function : 17.004290188 sec(s) ##

But then again I would like to gauge the performance on release.
JerrinX




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS