Bit Flags vs. Boolean

Started by
6 comments, last by taz0010 11 years, 6 months ago
I was trying to explain to a semi-programmer friend of mine today the purpose of Bit Flags vs. Booleans. And I couldn't think of any besides Memory Usage which seems rather weak seeing as it would save you a few 100 bytes of memory at most. So what's a could argument for Bit Flags I can tell him about, and to be honest I'm wondering myself right now too.
Advertisement
It tends to be a bit more memory than that in most use cases. You often don't just have one bit flag, you have one bit flag per user/entity/product/etc. It adds up.

It also speeds certain operations since an equality check is 1 op rather than N. And/Or/Xor/Negate are also 1 op instead of N.

Reading and writing from disk/database/network is faster since there's less space and/or compression going on.

I'm sure there's a few others I'm missing.
One example of a performance benefit would be doing something like render queue sorting based on a bitfield. It would be much better to sort based on a single field rather than a bunch of single flags.
Bitfields were available in C because C was used to develop the Unix operating system, and operating systems deal with hardware. Hardware control registers often have bit-addressable functions.

C also did not have a boolean type. The PDP architeture it was first developed on has a (bit-addressable) flag register in which the Z flag gets set every time an operation yields a zero result (and cleared otherwise), so it was easier to have control-flow operations act on zero or non-zero integral values. BTW, that's also why null-terminated character strings were used instead of the more-common (at the time) Pascal-style counted strings.

Stephen M. Webb
Professional Free Software Developer

One particular application you might want to look at is Bitboards.

One particular application you might want to look at is Bitboards.

This is probably the best example and is also what I was hinting at by using the bitboard concept to sort render ops via state, etc.
I couldn't think of any besides Memory Usage which seems rather weak seeing as it would save you a few 100 bytes of memory at most.
Saving memory can be important beyond just reducing RAM usage. Moving memory from RAM to the CPU is extremely slow, so shrinking data formats can be a good optimisation.

Let's say we've got a game object like:
struct Object { float x, y, z; bool active };//sizeof(Object)==16Due to alignment issues, the compiler adds an extra 3 bytes to this structure to round it up from 13 to 16 bytes. So the bool is actually costing us 4 bytes*numObjects.
Now let's say we want to run this algorithm:
int countActive=0;
for( int i=0 i!=numObjects; ++i )
countActive += objects.active ? 1 : 0;
My CPU has 64-byte cache lines (the amount of RAM that's transferred to the CPU in one operation). If we've got 1000 objects, then this algorithm has to fetch 250 cache lines (1000*16/64==250). Seeing that the CPU asm code is really simple, this operation will most likely be bottlenecked by the time it takes to perform those memory fetches -- let's fix this:
struct Pos { float x, y, z };//sizeof(Pos) == 12;
struct Objects { Pos xyz[1000]; bool active[1000]; }
...
int countActive=0;
for( int i=0 i!=1000; ++i )
countActive += objects.active ? 1 : 0;
Now we're only iterating through the bools, without the positions being interleaved. Furthermore the 3 padding bytes are gone. Now we've only got to fetch 16 cache lines (1000*1/64==15.625), which is way less memory bandwidth than before.
But we can do better:
struct Objects { Pos xyz[1000]; u32 active[(1000+31)/32]; }...
int countActive=0;
for( int i=0 i!=ArraySize(objects.active); ++i )
countActive += CountBitsSet(objects.active);
Now we only allocate 32 integers to store our bits, which is 128 bytes, which is a mere 2 cache lines. Now there's a much bigger chance that the actual loop's CPU asm code will be the bottleneck, instead of bottlenecking on RAM access speeds. We've reduced the original memory bandwidth requirement of the algorithm by over 100x!
One thing to keep in mind is that accessing bit fields will require the compiler to produce additional bit masking instructions, which will *increase* the memory footprint of your instruction code. If you try to use bit fields as a general rule instead of a situational optimisation, you'll end up with slower code.

Also, the alignment issues are subtle. Most types want to align on 4 byte boundaries, so AFAIK if you have a structure containing, say, an integer, and any less than 5 flags, then converting these flags to single bits won't save any space at all.

This topic is closed to new replies.

Advertisement