Archived

This topic is now archived and is closed to further replies.

Challenge 11/13: Do you know your code?

This topic is 5142 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The objective: A simple bounding box collision detection code segment. The question: Which of the two is faster? You have a point(ix,iy) that you must use to detect a collision with box, with a point(x,y) and width and length of 20. Thusly, you must decide which is faster: SEGMENT 1: if((ix > (x-10)) && (ix < (x+10)) && (iy > (y-10)) && (iy < (y+10))) { //collision code here! } SEGMENT 2: if(ix > (x-10)) { if(ix < (x+10)) { if(iy > (y-10)) { if(iy < (y+10)) { //collision code here! } } } } Do you know the answer? -Greven

Share this post


Link to post
Share on other sites
I say segment 1... because at least in C++ as soon as one of the conditions is false, it will move on to whatever's next

in segment 2 the machine is forced to evaluate each condition

EDIT:

oh wait... no it isn't because you nested the second one... I still go with segment 1

[edited by - tempuself on November 13, 2003 6:30:48 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by TempusElf
I say segment 1... because at least in C++ as soon as one of the conditions is false, it will move on to whatever''s next

in segment 2 the machine is forced to evaluate each condition


T
empusElf




No the if statements are inside each other, so if the first fails, the others are not evaluated.

Share this post


Link to post
Share on other sites
a) depends on how often that code is executed and where. It''s pointless even being concerned about the differences if its in an insignificant place.

b) they should be exactly the same. If the && were & things might be different, but && on a decent compiler should give you the same as the nested if statements by early-outing as soon as a condition fails.


c) depends on the particular compiler.

d) depends on the types of the input variables.

e) depends on what code surrounds that code.

f) depends on whether the compiler has enough spare registers/FP stack space to avoid temporaries in memory (that''s a case by case thing too - using that in one place may well be different than another aplace).

g) if you''ve profiled the posted code and found one of those to be significantly different than the other then I''d be VERY suspicious of the way you''re profiling. If it tests one then the other in a big loop or two big loops, it''s already potentally flawed due to the first one incurring the higher of the caching penalties.

h) if the compiler were able to re-order an expression for better branch prediction then the first might be slightly faster

Share this post


Link to post
Share on other sites
My intution tells me segment 1, just because there''s only one point the code has to jump to, whereas it appears that (unless the compiler optimizes it away) there are more possible address to jump to, thus bloating the code a little more.

Actually, I''m no c++ guy, so that''s more like my 1 1/2 cent''s worth,
-Michael

Share this post


Link to post
Share on other sites
quote:
Original post by Chris Hare
The first is better, it is more readable, and will short-circuit when optimized.

Wizza Wuzza?


Actually it will short-circuit even when unoptimized, the c++ standard requires it, so that you can do things like this, without getting access violations

if(player && player->IsAlive())
{
//Do stuff
}

Share this post


Link to post
Share on other sites
they are 100% identical logical constructs, so any compiler which doesn''t generate the exact same code should note be allowed to use the term "optimizing compiler" ... quite simply, back in the days before compilers did analysis and optimizations things like these:

// assuming i is an int
++i;
i++;
i = i + 1;
i += 1;

could each generate different compiled code ... but in the days of compiler code analysis, these should always be 100% equivelent (in this simple case) ... obviously for real classes with overloaded operators

// assuming custom class
++obj;
obj++;
obj = obj + 1;
obj += 1;

will still all generate different code ... so please people, quit using obj++; when you do not NEED postfix increment ... it''s just wrong.

Share this post


Link to post
Share on other sites
quote:

I say segment 1... because at least in C++ as soon as one of the conditions is false, it will move on to whatever''s next



So will section 2.

Share this post


Link to post
Share on other sites
Even faster (assuming int variables):

if((unsigned int)(ix-x+9) < 19 && (unsigned int)(iy-y+9) < 19)
{
//collision code here!
}


"Sneftel is correct, if rather vulgar." --Flarelocke

[edited by - sneftel on November 13, 2003 7:18:55 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by tuita
quote:

I say segment 1... because at least in C++ as soon as one of the conditions is false, it will move on to whatever''s next



So will section 2.


Don''t feel obligated to read the full post or anything

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Questions like this are exactly why everyone should be required to learn about compilers. The code is equivalent.

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
Questions like this are exactly why everyone should be required to learn about compilers. The code is equivalent.


Nah, this is a good post, because now he is learning about compilers, that kind of stuff sucks to learn from books... And Sneftel''s post showed a faster way, something most people would not have thought to do...

Share this post


Link to post
Share on other sites
An interesting question.

Yes, logically, the code is equivalent, and the compiler should generate the same instructions. But the devil, as they say, is always in the optimizer...uh, details.

I do not know for sure, but my intuition says that an aggressively optimizing compiler would attempt to optimize the first example, because it is in a single expression. For example, it could thrust ix and iy into a register just once, instead of twice, for the comparisons.

Of course, the compiler could do the same for the second set, but I believe that optimizing compilers would more likely attempt such subtle optimizations on expressions instead of consecutive flow of control statements, simply because such optimization is more likely to have payback.

Btw: do you have the answer? Which IS faster?

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
Questions like this are exactly why everyone should be required to learn about compilers.
Why? Not everyone cares if either form is 5% faster than the other.

Share this post


Link to post
Share on other sites
I must admit, I didn''t think about what Sneftel said, even if I have seen optimizations like that before. And it''s even possible, that the compiler might recognize those. In that case the first statement would probably be esier. But if it''s really really good, it should recognize it in both of the variations.

But I can''t tell if rewriting it like that is faster or not. It all depends on what values are input, if it can jump out of the first branch nearly all the time, in that case the rewritten statement would have one more addition. It also depends on how effectively the branch prediction works.

quote:
Original post by civguy
quote:
Original post by Anonymous Poster
Questions like this are exactly why everyone should be required to learn about compilers.
Why? Not everyone cares if either form is 5% faster than the other.


Yes, the majority of the programmers probably don''t care. But of course for serious gameprogrammer, it''s a good thing to know, because it speeds up the optimizing phase a bit, when you already know which is faster.



Share this post


Link to post
Share on other sites
Most compilers now do a pretty good job at optimizing. If you''re worried about a particular construct, most compilers have an option to dump out the assembly. For example Microsoft C++ 7.1 has the /FAs switch that dumps out the assembly with the C/C++ source as comments

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
That''s what I did and the ASM code, even through unoptimized C++ compilation, is EXACTLY the same.

Share this post


Link to post
Share on other sites
The results. Let''s call Sneftel''s code Segment 3 for fun:

Using an average of 100 tests of 1 000 000 iterations (the same result was found between 10, 50, and 100 tests). Remember, this is 1 MILLION iterations of the code to get a somewhat accurate result in differences in speed. Thusly, you can see the differences are very minor. This is using MSVC++6.0 compiler.

When all conditionals passed:
Segment1 = 13ms.
Segment2 = 11ms.
Segment3 = 10ms.

When first conditional fails (ix > (x-10):
Segment1: 8ms.
Segment2: 6ms.
Segment3: 8ms.

When second conditional fails (ix < (x+10)):
Segment1: 9ms.
Segment2: 8ms.
Segment3: 8ms.

When third conditional fails (iy > (y-10)):
Segment1: 11ms.
Segment2: 10ms.
Segment3: 8ms.

When fourth conditional fails (iy < (y+10)):
Segment1: 12ms.
Segment2: 13ms.
Segment3: 9ms.

Now, you might notice that with the fourth conditional failure, Segment2 has a higher value than if all tests passed. Me, I''m not sure why. I did the simulation 10 times, and got the same value every time. Bleh.

I suppose the testing could be flawed... I had to push it to a million iterations to even get a readible value.. otherwise it''d always turn up 1ms or 0ms, sometimes 2ms.

Remember, though, this is the time it takes to do 1million iterations of this, so the speeds are virtually identical.

-Greven

Share this post


Link to post
Share on other sites
it is worth mentioning, that although optimizing the exact layout of your code for your compiler to digest into the best code makes sense in a few rare cases (you are writing a driver for gigabit ethernet, or you are down in the guts of a 3D algorithm) ... it IS usefull to have in the back of your mind ordering short-circuit operations by their likely success ... IF and ONLY IF you have enough knowledge about your domain data to make such assesments ... or also by their relative costs

for example ... if you know that you are looking for a person with the first and last name "John" "Doe" in a database, where each name may be null ... testing for either null before checking the actual content of either would make sense ...

ie: if(firstname != NULL && lastname != NULL && strcmp(firstname,"John") == 0 && strcmp(lastname,"Doe") == 0)

please understand, I don''t usually spend much time thinking about these thigns ... but just a few minutes for the first few weeks will change your habits so that each time a compound if statement comes up, or nested logic ... you can spend the 1-10 seconds needed to look for the efficient way to order/nest it ... the key is getting the best combination of developer productivity and code efficiency ... don''t kill yourself over a 3% performance gain. AND don''t introduce maintenence nightmares in the name of performance UNLESS you have a verified profiled bottleneck worthy of such special treatment.

Share this post


Link to post
Share on other sites
quote:
Original post by Evil_Greven
I suppose the testing could be flawed... I had to push it to a million iterations to even get a readible value.. otherwise it''d always turn up 1ms or 0ms, sometimes 2ms.

Push it up to a billion. IMHO, benchmarking tests that last less than a second are almost invariably flawed.


"Sneftel is correct, if rather vulgar." --Flarelocke

Share this post


Link to post
Share on other sites
Alrighty, 1 billion iterations with 10 tests.. Takes awhile on my poor 1ghz, but here's the results, though this is with 10 tests rather than 100.. it still took like 5 minutes, heh:

When all conditionals passed:
Segment1 = 12299ms.
Segment2 = 10552ms.
Segment3 = 9979ms.

When first conditional fails (ix > (x-10)):
Segment1: 7190ms.
Segment2: 6153ms.
Segment3: 7181ms.

When second conditional fails (ix < (x+10)):
Segment1: 8202ms.
Segment2: 7507ms.
Segment3: 7162ms.

When third conditional fails (iy > (y-10)):
Segment1: 10259ms.
Segment2: 9699ms.
Segment3: 8197ms.

When fourth conditional fails (iy < (y+10)):
Segment1: 11290ms.
Segment2: 11859ms.
Segment3: 8196ms.

With the exception of the last one, they're pretty much what you'd expect.. it was rounded down with the others due to integer value. However, strangely enough, again with the fourth conditional failing the Segment2 is HIGHER than if it succeeds.

I'm a bit confused.

But, anyway, since this is 1,000,000,000 iterations, the actual speed of the code segments are quite low.

Ex.
When all conditionals passed:
Segment1 = 0.000012299ms.
Segment2 = 0.000010552ms.
Segment3 = 0.000009979ms.


[edited by - Evil_Greven on November 14, 2003 6:38:03 PM]

Share this post


Link to post
Share on other sites