Back to General and Gameplay Programming

Silly Question On Performance

General and Gameplay Programming Programming

Started by fathom88 February 28, 2006 02:57 PM

15 comments, last by Spoonbender 18 years, 1 month ago

fathom88

180

Author

February 28, 2006 02:57 PM

I'm trying to improve the performance of some C++ code. The logic is simple and I don't see a way to improve it. double test = 0.0; double array_1[MAX]; double array_2[MAX]; double array_result[MAX]; for(xy = 0; xy < MAX; xy++) { test += array_1[xy]; test += array_2[xy]; array_result[xy] = test * 0.50; //I want the average; I hear multiply is quicker than divide } I don't see a way to improve it by anything meaningful. Sorry if this post is kind of silly. My problem is I have fairly simple calculations to do, but I have to do them a repeated number of times.

ZQJ

496

February 28, 2006 03:04 PM

There's only two ways I can think of to improve the performance of that - use a more optimizing compiler or buy a faster computer.

anonuser

148

February 28, 2006 03:13 PM

Fast math routines, that's about all I can think of.

The only other thing is not doing it in series since you have a finite set of data it shouldn't be that hard.
But this is C and the overhead of creating threads would probably make all potential performance nil for all but large data sets.

So if you have a large set of data, thread it.

Fruny

1,658

February 28, 2006 03:17 PM

Quote:Original post by anonuser
So if you have a large set of data, thread it.

Not going to help unless you have multiple CPUs. Even then, the loop is memory-bound. The only way I can think to make that might make that code faster would be to interleave the arrays in memory ( [A,B,C,A,B,C,...] rather than [A,A,A...] [B,B,B...] [C,C,C...] ) to better exploit locality of reference (and CPU cache).

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan

anonuser

148

February 28, 2006 03:22 PM

Quote:Original post by Fruny
Quote:Original post by anonuser
So if you have a large set of data, thread it.

Not going to help unless you have multiple CPUs. Even then, the loop is memory-bound. The only way I can think to make that might make that code faster would be to interleave the arrays in memory ( [A,B,C,A,B,C,...] rather than [A,A,A...] [B,B,B...] [C,C,C...] ) to better exploit locality of reference (and CPU cache).

I did forget to mention the CPU part, thanks for the correction.

ProphecyEye

122

February 28, 2006 03:23 PM

You could possibly do some parallel processing with a single CPU that supports sse2:

http://www.gamedev.net/reference/articles/article1987.asp

Also, if you don't need the precision, I would use floats instead of doubles. Doing so can allow you to do 4 floating point operations at once. It's also just less data to move around not to mention 32bit cpus generally like 32bit vars.

blaze02

100

February 28, 2006 03:31 PM

Quote:Original post by fathom88
I'm trying to improve the performance of some C++ code. The logic is simple and I don't see a way to improve it.

double test = 0.0;
double array_1[MAX];
double array_2[MAX];
double array_result[MAX];
for(xy = 0; xy < MAX; xy++)
{
test += array_1[xy];
test += array_2[xy];

array_result[xy] = test * 0.50;

You say you want the average when you are dividing test by 2, i think you need to set test back to zero.
Yes, multiplying is faster, but a good compiler will optimize that for you.
xy++, typically you want to do ++xy. Although it does not matter unless xy is some sort of object.
You could thread if you had multiple CPUs, but most likely you (and whoever is going to run this) do not.
The only optimization I see that would make a difference is to unfurl your loop. "xy < MAX" takes the CPU about 2 clock cycles to calculate each iteration. Also, the conditional jump will interfere with your processing pipeline.. yadda yadda, but most CPU's are so damned smart that you will only loose a couple clock cycles in total due to the conditional part of the for loop. So if you know what MAX is before hand, you can get rid of the for loop and write as many lines as need be to get the job done. It will save you about 2 clock cycles * MAX = (probably less than a microsecond). :(
The loop unfurl technique works well with memory copies that are used an insane amount of times... works well => saves milliseconds. Again, :(

You shouldn't worry about optimizing small amounts of code like this, because the compiler will optimize it for you.

-------Harmotion - Free 1v1 top-down shooter!Double Jump Studios Blog

blaze02

100

February 28, 2006 03:34 PM

Quote:Original post by ProphecyEye
You could possibly do some parallel processing with a single CPU that supports sse2:

http://www.gamedev.net/reference/articles/article1987.asp

Also, if you don't need the precision, I would use floats instead of doubles. Doing so can allow you to do 4 floating point operations at once. It's also just less data to move around not to mention 32bit cpus generally like 32bit vars.

It would be less data to move around, but 32 bit cpus don't execute anything with floats. They let the FPU handle that. And the FPU converts both 32bit and 64bit floats to 80bit floats before calculating anything. And what are you talking about (floats) would allow 4 floating point operations at once? I don't think the FPU knows how to do that.

-------Harmotion - Free 1v1 top-down shooter!Double Jump Studios Blog

bakery2k1

712

February 28, 2006 03:38 PM

Quote:Original post by blaze02
And what are you talking about (floats) would allow 4 floating point operations at once? I don't think the FPU knows how to do that.

SSE

Anonymous

February 28, 2006 03:41 PM

That is the perfect code to optimize with SSE/AMD 3DNow. That's a classic example of a vector computation.

Silly Question On Performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Silly Question On Performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines