• Advertisement
Sign in to follow this  

Basic Operations (add, sub., muilt., etc) - Speed?

This topic is 4327 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I remember on a reply to a post I made, where I was asking whether I should use one logical operator or another one for speed, it said that I needn't worry since the speed was about equal (don't remember the exact words). My main question is: Are there any differences in speed between the basic functions, add, multiply, divide, bitiwise ops, logical ops, etc? Also, would there be any speed difference between things such as "a=a+1", "a+=1", and "a++"? Examples: > If I were to iterate through the pixels of a section in a bitmap, should I just use some simple multiplications to find where the pixel is ("ptr=start+(x+y*width)"), or would I want to try and use addition more? > Is there any difference for checking if a number (32-bit int) is odd: > > if (x % 2 == 1) > > if (x % 2 > 0) > > if ((x & 1) == 1) > > if ((x & 1) > 0) Please bear with me on this.

Share this post


Link to post
Share on other sites
Advertisement
This would fall under the realm of premature optimization. Chances are any difference in times between your operators will be overshadowed by the relative efficiency of the algorithm you choose to implement.

Further, the compiler will most likely optimize any choice of operations into their fastest equivalent.

Share this post


Link to post
Share on other sites
Bit shifting on some processors takes longer than multiplying simple numbers.

Share this post


Link to post
Share on other sites
You only need to worry about this if you are writing a compiler. To answer your question though

Quote:
Original post by deadimp
My main question is: Are there any differences in speed between the basic functions, add, multiply, divide, bitiwise ops, logical ops, etc?


Multiply and divide will take a tad longer than the others on most architectures (so far as I know).

Quote:
Original post by deadimp
Also, would there be any speed difference between things such as "a=a+1", "a+=1", and "a++"?


These are all equivalent at the assembly level.

Quote:
Original post by deadimp
Is there any difference for checking if a number (32-bit int) is odd:
> > if (x % 2 == 1)
> > if (x % 2 > 0)
> > if ((x & 1) == 1)
> > if ((x & 1) > 0)


The second two will probably be a little faster (but the compiler will most likely change your code).

The only reason I studied any of this was for a class. Serioulsy, if I were you I would not bother at this low of a level. Also as M2TM pointed out, it is all dependant on the processor. Compiler writers will optimize your code for their processor.

Good Luck!

Share this post


Link to post
Share on other sites
Further, fixing your algorithm so you're not iterating through pixels will yield orders of magnitude better results than any bit fiddling.

Share this post


Link to post
Share on other sites
Alright, I will take this into consideration.
Thanks!

Telastyn << For the bitmap algorithms, the example I meant it for was pixel-precise collision detection.

Share this post


Link to post
Share on other sites
Quote:
Original post by deadimp
Alright, I will take this into consideration.
Thanks!

Telastyn << For the bitmap algorithms, the example I meant it for was pixel-precise collision detection.


Quote:

Further, fixing your algorithm so you're not iterating through pixels will yield orders of magnitude better results than any bit fiddling.

Share this post


Link to post
Share on other sites
Do you have any idea how fast your computer is? The 1 optimisation I would consider is to iterate through your bitmap the way it lies in memory to make sure everything stays in the cache.

Judging by what you've shown, you should have y on the outer loop like this:

for (int y = 0; y < height; y++)
for (int x = 0; x < width; x++)


EDIT: Oh, collision detection. Yeah, you shouldn't really have to iterate through everything.

Share this post


Link to post
Share on other sites
*Confused*
Well, then again, I don't exacly have much of a knowledge base on image algorithms...

[Heh, that sounded somewhat sophisticated]

Share this post


Link to post
Share on other sites
If you want to do pixel-perfect, consider using masks and AND. If you don't get this I can clarify.

Share this post


Link to post
Share on other sites
Yeah, that's what I did (sort of), before I switched to D3D. I still have the same code, but I haven't really tested it out. Linkage

EDIT: Holy Schnee, I see what you mean with AND. Gotta change that.

Share this post


Link to post
Share on other sites
A better way to speed up your collision detection is to change the algorithm. In a 3D game, checking each and every triangle against each and every other triangle would be way too time consuming, its roughly an n^2 algorithm where n is the number of triangles. A more efficiant approach is to only check triangles from objects that are close enough to possibly collide. This is called a "broad scope" phase and in 3D games usually means that a sphere (or other simple shape) around each object is tested against other object spheres. Only if the spheres are colliding can any triangles colide, which you would then check. If the spheres do not collide, no triangles can collide, so they can all be skipped. This might be several thousand trangle collision detections saved against hundreds of other objects each consisting of their own thousands of triangles. This is a HUGE calculation savings! More than you could ever hope to save my optimizing an all-triangles method.

Similarly, in 2D, you can first check a simple shape around objects for collision BEFORE you ever need to worry about pixel-level collision. Depending on the type of game, this shape is most often a circle(a 2D sphere,) a Box (axis-aligned Bounding Box) or both. If these simple 2D shapes do not overlap, non of their pixels can overlap either, saving you hundreds (or thousands, depending on the size of the sprite) of pixel-perfect collision detections.

These are the Cardinal rules of optimization:
1)Do the work you need the smartest way possible.
2)Do the work you need only! Never any more.
3)Do the work you need as little as possible.
4)Do the work you need as fast as possible.

Each of these is worth approximately one order of magnitude greater than the next, in ascending order. That is, #1 provides more optimizational value than 2, which provides more than #3, etc. Start at the top of the list and work your way down. A single optimization at level 1 is worth 100 optimizations at level 4.

Share this post


Link to post
Share on other sites
Quote:
Original post by deadimp
Also, would there be any speed difference between things such as "a=a+1", "a+=1", and "a++"?

Just wanted to point out that "a++" is not equivalent to the other two.

Share this post


Link to post
Share on other sites
Quote:
Original post by Fred304
Quote:
Original post by deadimp
Also, would there be any speed difference between things such as "a=a+1", "a+=1", and "a++"?

Just wanted to point out that "a++" is not equivalent to the other two.


As a statment, they are equivalent. For expressions, you are right. [grin]

Share this post


Link to post
Share on other sites
If you really want to know how long these operations take, check with AMD and Intel's CPU documentation. They list the latency of every instruction.

But as others have said, this is definitely premature optimization (and microoptimization at that).

Let's do a bit of math.

You have a CPU. It's probably around 3GHz (A bit less if it's AMD, a bit more if it's Intel)

That means it can do 3 * 10^9 cycles per second. Each cycle, it can perform up to three instructions.

So in one second, it can do 9 * 10^9 instructions (theoretical peak).

Now, most instructions have a latency of 1-4 cycles.
So, what if you do pick a 4 cycle instruction instead of one that only takes 1?

Well, you waste a whopping 3 cycles *if*, and only if, the following instructions depend on the result from this instruction. (But usually that won't happen, because the compiler tries to schedule instructions to avoid these dependencies, and even if it fails, the CPU itself attempts the same). So most likely, it won't slow you down at all)
3 cycles? That's about a nanosecond. And this is in the worst case. You could do this a million times in a second, and it still wouldn't make a noticeable difference (it'd slow you down by one millisecond)

Oh, and just because everyone else skipped past it, RAZORUNREAL had a valid point too. *If* you iterate through all pixels (or anything else in a 2d array), make sure you take one row at a time, rather than a column. Why? The CPU cache likes that *much* better. Can easily double your performance.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement