• Advertisement

Archived

This topic is now archived and is closed to further replies.

code speed

This topic is 5151 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

okok, i got 2 sets of codes here, but how do u judge which will eventually perform better? the objective is to simply add values from 2 similar objects to a similar third. //given that myobject has a float array of size 3, named ''x'' MyObject R; MyObject v1, v2; //sample 1 float __gc* t1 = &v1.x[0]; float __gc* t2 = &v2.x[0]; float __gc* r1 = &R.x[0]; for(int i=0; i<3; i++) (*r1++) = (*t1++) + (*t2++); //sample 2 R.x[0] = v1.x[0] + v2.x[0]; R.x[1] = v1.x[1] + v2.x[1]; R.x[2] = v1.x[2] + v2.x[2]; need some advice thx! Edwinz

Share this post


Link to post
Share on other sites
Advertisement
2 ways i can think of.

1) Use a profiler.

2) loop the operations for maybe 1million or more times and measure the time it took to do that.

long startTime = GetTickCount();
long timetaken;
for(int i=0;i<1000000;i++)
{
//do your stuff
}

timetaken = GetTickCount() - startTime;

Share this post


Link to post
Share on other sites
The best thing to do is not worry about it until you need to. Implement it now whichever way you like. Then later down the road when you are profiling the performance of the app as a whole, change it if it proves to be a bottleneck.

Share this post


Link to post
Share on other sites
Use the first form. Compilers in general prefer it, and it avoids repetition (you''re not writing the indices 3 times).

Share this post


Link to post
Share on other sites
Second way looks to be better, as it contains less operations.
Also readability is better by far.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
If the compiler doesn''t unroll the loop, I''d guess 2 is faster

Share this post


Link to post
Share on other sites
quote:
Original post by Aldacron
The best thing to do is not worry about it until you need to. Implement it now whichever way you like. Then later down the road when you are profiling the performance of the app as a whole, change it if it proves to be a bottleneck.


Exactly. And write MyObject::operator+/operator+= so you don''t repeat your code everywhere you need to add two MyObjects together.



--
Dave Mikesell Software & Consulting

Share this post


Link to post
Share on other sites
if the compiler finds a better way, it will totally rearrange your code. dont care about such optimizations!

"Knowledge is no more expensive than ignorance, and at least as satisfying." -Barrin

Share this post


Link to post
Share on other sites
Since the question is about speed, solution 2 is much MUCH better.

*p1++=*p2++ + *p3++ style means more register pressure, more instructions, more code cache lines, more AGIs than a
p1[ i ] = p2[ i ] + p3[ i ] style.

"The best thing to do is not worry about it until you need to ..."

I think it's a common and false argument. If every piece of your code is written twice slower than what should be, even if you optimize a few bottlenecks (let them cost 30% of the ressources) the rest will still slow down the whole soft by a huge factor. So it's important to know and include some basic rules of optimizations in a coding style. That's why the issue of edwinnie is meaningful. Once you know these rules, it's not slower to write a far better code the first time. And next when you profile and optimize there is less work left to do.

"Use the first form. Compilers in general prefer it, and it avoids repetition (you're not writing the indices 3 times)."

No most compilers will generate a poor code with 1 as explained earlier. Now concerning readability ... just count the lines. And *p++ looks a bit barbarian for most C noobs.

Sample 1 looks like a deprecated attempt to tweak the compiler. This could be OK on a 68000, but I don't think many target 68000s these days. When you don't know, choose the shortest syntax and hope the compiler will do the rest for you. If you guide the compiler towards a hacker style, it will loose some degrees of freedom for its optimizations. If your hacker style is fake (you are not able to anticipate the code the compilers will generate on various platforms) then you'll get poorer results.



[edited by - Charles B on January 12, 2004 9:49:06 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by Charles B

I think it''s a common and false argument. If every piece of your code is written twice slower than what should be, even if you optimize a few bottlenecks (let them cost 30% of the ressources) the rest will still slow down the whole soft by a huge factor. So it''s important to know and include some basic rules of optimizations in a coding style.



It''s common but not false. Yes, there are cases where you learn to avoid certain constructs and make extensive use of others, but for most of the application it really does not matter.

And anyone who waits until the app is finished to profile is not properly doing their job. Ideally, each subsytem would be profiled in seclusion from the rest of the poject before it is integrated, then profiling should be performed to see how it affects the project as a whole. This is not always possible during the various stages of development but is a good practive to try and follow.

In the example he presented, my first guess is that option 2 is fastest. However, I have no way of knowing that for sure without either seeing the compiler output or profiling. Who knows? One compiler might optimize away the loop in the first case so that the resultant output is the same, while another may not. In the end, if this is a commonly used function which is used in critical places, the only way to be sure is to plug it in and profile.

Share this post


Link to post
Share on other sites
quote:
*p1++=*p2++ + *p3++ style means more register pressure, more instructions, more code cache lines, more AGIs than a
p1[ i ] = p2[ i ] + p3[ i ] style.

Ah, but that wasn''t one of the options.

The second example was explicitly unrolled. I will agree with you about p1[ i ] (etc.) -- I''d let the optimizer perform the strength reduction for me, rather than stating it explicitly -- but working within the constraints of the system, the first, strength-reduced iteration is better than the hardcoded unrolled loop.

Share this post


Link to post
Share on other sites

  • Advertisement