Sign in to follow this  

Should I check a bool before I write it?

This topic is 3590 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there, maybe this question sounds strange but it just came to my mind and I found no reasonable answer... I have a routine which gets calles quite often. Everytime the function gets called I have to set a boolean variable to true (it does not matter why I have to do this). Is it better/faster to first check if the boolean variable is false and if so, write true to it and if not, just continue without writing? Or does this tiny check not matter at all? Basically it is: a = true; or if (!a) a = true; Thanks in advance, Tom

Share this post


Link to post
Share on other sites
Short answer: Just write it regardless.

Long answer:
In case of dirty flags, the test doesn't make sense.

If you're using something else however, there's two possibilities. One is where the value you're writing is big, and the cost of write is non-negligible. A class requiring deep copy and memory allocations. In that case, testing for equality might save some time.

Another case where this could save time, although very limited, is if the test would perform cache warm-up. By first testing it, you'd ensure that the destination you're writing to is in cache. But this would only give you benefits if you'd need to write multiple, co-located values. For example, when setting an array of bools, accessing (reading) first and last value would improve your chances of having entire array located in L1, and then writing entire array at once.

In your particular case, testing for a value requires a branch, which despite triviality of instructions involved, could decrease performance.

Lastly - it's a freeking 8-bit bool value. Just write it. It's not something that will show up in profiler - ever. The overhead from function call and memory access (possibly L1 and L2 cache miss) can be *thousands* of times bigger than just setting the value.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
In your particular case, testing for a value requires a branch, which despite triviality of instructions involved, could decrease performance.

QFE. Avoid branches whenever possible.

Share this post


Link to post
Share on other sites
Quote:
Original post by VooDooTom
What do you mean with branch? Is it that there is the if-statement which causes some kind of "subroutine"?


Branch Prediction

basically, the CPU tries to guess(based on various things) if a branch is taken or not, in order to try to keep the pipeline full, which allows for faster instruction execution (ie, it can fetch instructions before they are processed). However, if it makes a mistake, it has to empty the pipeline and then refill it, which can potentially take many cycles.

Share this post


Link to post
Share on other sites
The existence of branches in your code can also inhibit some compiler optimizations since the compiler can't guarantee if certain instructions will be executed.

However this isn't something you should be thinking about until you've profiled.

Share this post


Link to post
Share on other sites
I can't wait to see you guys try and make games with no branching ;-)

A branch is an instruction which causes the cpu to execute one code path or another depending on some criteria such as if a value is non-zero (jnz), or if a value is higher than another (jh). These are characterised in your code by conditional branches such as if( <expression> ).

If you want some friendly advice, dont worry too much about performance just yet, it's going to be a while before you've written enough code to totally max out your cpu.

Share this post


Link to post
Share on other sites
Quote:
Original post by VooDooTom
Oh, ok thanks.

Well I am on optimizing currently since I am writing a map displaying software for mobile devices :)


Er, and those two statements are related how, exactly?
Making software for mobile devices is still not a valid excuse for wasting time trying to optimize random bits of code.

Share this post


Link to post
Share on other sites
erm...maybe you overread the posting before mine.
Surely the statement that I am currently in a phase of optimizing things in my project has basically nothing to do with "Oh, ok thanks" ;)

TheGilb just said that one should not worry about performance as long as one gets forced to do so.

Well, I am forced :)

But still this has nothing to do with the initial question since even if the additional check causes some branching it is not a thing I would do in a performance critical section.

It was kind of a question for good coding practice. I know that will not increase any performance issues in my application. I just wanted to know if A would be better than B.

I hope this helps you to understand ;)

Why do you think I am optimizing random bits of code?

Share this post


Link to post
Share on other sites
Quote:
Original post by TheGilb
I can't wait to see you guys try and make games with no branching ;-)


Many algorithms used in games can be re-written without loops or branches. This is kind of what I'm doing right now as I'm writing code for the Cell SPU processor. It can very tricky though, because it requires thinking at the assembly level. Knowing the target processor is always a win because you can take advantage of special instructions. Simple example:


// Find the first non-zero bit in a mask, C++
int first = -1;
for(int i=0; i<32; i++)
{
if(mask & (1 << i))
{
first = i;
break;
}
}

// Find the first non-zero bit in a mask, ASM
int first;
asm("clz %0, %1" : "=r"(first) : "r"(mask)); // Cell SPU has count leading zeroes instruction
asm("ai %0, %1, -31" : "=r"(first) : "r"(first)); // Subtract 31






I would suggest to optimize a piece of code once you have profiled it and found it to be "hot" (called many times per frame and takes a considerable number of milliseconds).

Share this post


Link to post
Share on other sites
Quote:
Original post by VooDooTom
Well, I am forced :)

Okay, but that's not a logical conclusion from simply the earlier statement:
"I am writing a map displaying software for mobile devices :)"

Quote:
Why do you think I am optimizing random bits of code?


Trying to optimize flag assignment is random. You said as much yourself:
"It is not a bottleneck in my program."

Here's a hint: If you're not optimizing a bottleneck of your program, you're optimizing at random. Blindly. In a way that's likely to waste your time, and carries the chance of actually worsening performance in the long run. What helps optimize in one situation can hurt in another, and when you're not optimizing your bottlenecks first, you'll end up changing enough when you do get around to it, that you carry the risk of your own previous "optimizations" will do exactly that -- hurt.

Share this post


Link to post
Share on other sites
Quote:
Original post by TheGilb
I can't wait to see you guys try and make games with no branching ;-)


FSM.

I believe some of those are Turing complete, but this theory isn't really my strong point. Nor am I claiming they are practical programming model. Then again, processors tend to be built around them to a degree.

Share this post


Link to post
Share on other sites
Quote:
Original post by VooDooTom
Why do you think I am optimizing random bits of code?


Why should we think your decision to optimize this particular bit of code is not random? This is what profiling is for. Programmers - even very skilled ones with huge amounts of experience - are typically bad at guessing what the slow part of the code is. Certainly they are much worse than might reasonably be expected.

Anyway. On mobile you need to optimize for space, too, and that means not speculatively adding extra checks to your code. Simple is very often best. It's not always a time/space trade-off; software design (like all design) is probably more about what you don't do than about what you do.

@Antheus, an FSM would have to be implemented in hardware to avoid branching (since it would otherwise be emulated), and the argument could be made that actually it branches on *every* clock cycle [smile]

Share this post


Link to post
Share on other sites
Quote:
Original post by VooDooTom
Well, I am forced :)

Fair point. But even then, you need to figure out *where* to look at optimization too. It's very easy to waste a lot of time trying to "optimize" some exotic bit of your code, which then ends up being marginally slower.

Quote:

It was kind of a question for good coding practice. I know that will not increase any performance issues in my application. I just wanted to know if A would be better than B.

Fair enough then. :)
It's something that's hard to really get an overview over unless you actually write a compiler or mess around with assembly programming, or otherwise get an excuse for studying the low-level effects of your code.

However, the CPU operates on 32-bit values at a time, and it works in an extremely mechanical manner. It doesn't matter what the value is to begin with (just like it doesn't matter what two numbers are when you multiply them. The multiplication always takes the same time, unlike for us humans). The CPU is happy if things are predictable. It doesn't really matter that it has to do a small amount of extra work, if it can be predicted easily, if it's straightforward. Branches (if-statements) and everything that breaks the flow, make your CPU sad. [wink] (In a small way. Don't take this to mean that you shouldn't call functions or use if-statements. They're not really expensive overall, partly because the CPU puts a lot of effort into trying to predict them *anyway*)

Writing a value is usually a single instruction, and it's fully pipelined so the CPU can get on with the program in the following clock cycle. While accessing memory to actually write the data might take a while, we can usually avoid having to wait for that (by only writing it to the cache to begin with, and because we don't actually need to wait for it to complete before proceeding)
In other words, it's impossible to reduce the cost any further. It only slows the CPU down by a single clock cycle. By contrast, testing if the value is *already* true, would involve, at the very least:
- Read the value from memory (it might be in cache or a register already, but let's assume the worst case for now). That might take up to 150 cycles alone, and we can't proceed until we've retrieved the data.
- Perform the actual comparison (that's simple enough, 1 cycle for that)
- Execute the conditional branch instruction (if condition is true, jump to this address). That in itself is *usually* fast enough, but there are a few caveats. First, it relies entirely on the CPU being able to predict whether the branch is taken. That's easy if it was taken the last 5 times, for example or there's another simple pattern. Then the CPU can guess at the outcome of the branch, and only if the guess turns out to be wrong, does it have to roll back and start over. Not so if it's completely random whether or not to take the branch. Then it might stall the entire CPU pipeline for 5-10 cycles. That's bad. Further, the presence of a branch also limits how much freedom the compiler has in optimizing your code. (Because it can't easily move reorder instructions across a branch.)
- And then finally, we might still have to do the actual assignment. (another 1 cycle)

Hope that helps you understand the costs involved in a bit more detail :)
The above isn't accurate in 100% detail for every CPU, but it should give you a rough overview of what's involved.

Quote:

Why do you think I am optimizing random bits of code?

Because 1: You said yo uwere optimizing, and 2: The question you asked is a *very* random bit of code. [grin]

Share this post


Link to post
Share on other sites
Thanks again for the detailed exlpanations :)

Quote:

Because 1: You said yo uwere optimizing, and 2: The question you asked is a *very* random bit of code.

Quote:

Why should we think your decision to optimize this particular bit of code is not random?


Well, I did not mean that I was trying to optimize by looking out for all "if (!a) a = true;" cases in my code. The desired information was just for interest - and like said before, good coding :)

Anyway, thanks

Share this post


Link to post
Share on other sites
Quote:
Original post by VooDooTom
It was kind of a question for good coding practice. I know that will not increase any performance issues in my application. I just wanted to know if A would be better than B.


"Better" and "faster" are two totally different questions. And although your original question did say 'better/faster', everything else in your post pointed towards the 'faster' side of that question. If you want to know which is 'better', ask yourself which one makes more sense and which one is more readable. The simple assignment is the clear winner in this regard, taking a fraction of the time to understand fully.

Share this post


Link to post
Share on other sites

This topic is 3590 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this