Question about type, and displaying the bits of a char

Started by
11 comments, last by frob 8 years, 5 months ago

So I was looking through an older C++ book and I stumbled upon this problem:

Display the intern representation of a char.

And the way it was solved:


struct bits{
unsigned b0:1;
unsigned b1:1;
unsigned b2:1;
unsigned b3:1;
unsigned b4:1;
unsigned b5:1;
unsigned b6:1;
unsigned b7:1;

};

union character{
bits b;
char c;
};


character byte;

int main(){
cin >> byte.c;

cout << byte.b.b7 << byte.b.b6 << byte.b.b5 << byte.b.b4;
cout << byte.b.b3 << byte.b.b2 << byte.b.b1 << byte.b.b0;

}
}

And I got a bit confused, because there was some syntax I've never seen before:

unsigned b0:1;

What type is b0? And what does ":1" do?

The way the code works, it appears that b0 is of type BIT, which I'm not sure can happen in C++.

Advertisement

http://en.cppreference.com/w/cpp/language/bit_field

"unsigned" is a shorthand for unsigned int.

God damn it, I had a super long post typed up and chrome crashed because I hit CTRL+S.

You don't really see bitfields in desktop applications because there's no need to conserve memory. You'll find them in code that makes heavy use if bit masking (crypto?) or memory conservation (networking code?). I program micro controllers for my day job and we make heavy use of bitfields.

[EDIT] Just to be clear and as mentioned in the coming posts: This is not portable.

Here's an example. Imagine you had to pack a "move" instruction of a chess piece in a game of chess into as little space as possible. Reasons being you want to transmit the move over a network and save space. You could encode this by using a "to" and "from" coordinate. Seeing as a chess board is 8x8, a coordinate can be packed into 6 bits. You could write:
u16 move = 0;
move |= (current_x << 0);
move |= (current_y << 3);
move |= (new_x << 6);
move |= (new_y << 9);
/* 4 bits reserved for whatever */
Using bitfields makes this much more readable:
struct chess_move_t {
    union {
        struct {
            unsigned current_x : 3;
            unsigned current_y : 3;
            unsigned new_x : 3;
            unsigned new_y : 3;
            unsigned :4; /* unused */
        };
        u16 data;
    };
};
The following code does the same thing as the first example.
struct chess_move_t move;
move.current_x = current_x;
move.current_y = current_y;
move.new_x = new_x;
move.new_y = new_y;




Here's a real world example of some micro controller code, just in case you were wondering.
void timer_init(void)
{
    /*
     * Target interrupt frequency is 100Hz
     * 
     * Fcy = 7.37 * 65 / 8 = 59.88125 MHz
     * Prescale 1:64 ~ 936 kHz
     * Using 16-bit timer type B: count to 9356 for 100 Hz
     * 
     * We'll be using a timer type B, specifically timer 4, so we don't
     * clash with the timer required for ADC conversions.
     * 
     * Notes on config:
     *  + Clock source select by default is Fosc / 2
     *  + Default mode is 16-bit mode
     */
    T4CONbits.TON = 0;      /* disable timer during config */
    T4CONbits.TCKPS = 0x02; /* prescale 1:64 */
    PR4 = 9356;             /* period match, divide the 936 kHz by 9356 to 
                             * reach 10ms */
    IFS1bits.T4IF = 0;      /* clear interrupt flag */
    IEC1bits.T4IE = 1;      /* enable timer 4 interrupts */
    
    T4CONbits.TON = 1;      /* start timer */
}
The relevant bitfield declarations are the following (this is found in a header file provided by Microchip)
#define T4CON T4CON
extern volatile unsigned int  T4CON __attribute__((__sfr__));
__extension__ typedef struct tagT4CONBITS {
  union {
    struct {
      unsigned :1;
      unsigned TCS:1;
      unsigned :1;
      unsigned T32:1;
      unsigned TCKPS:2;
      unsigned TGATE:1;
      unsigned :6;
      unsigned TSIDL:1;
      unsigned :1;
      unsigned TON:1;
    };
    struct {
      unsigned :4;
      unsigned TCKPS0:1;
      unsigned TCKPS1:1;
    };
  };
} T4CONBITS;
extern volatile T4CONBITS T4CONbits __attribute__((__sfr__));
"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty


You don't really see bitfields in desktop applications because there's no need to conserve memory.
Most of my job when optimizing game engine code is to do with conserving memory. We have a lot of RAM, but the number of bytes you can actually access per clock cycle is lower now than it was 5 years ago. e.g. the Wii U has more GB of RAM total than an Xbox 360, but the number of MB that you can actually touch per frame is actually lower!

Modern CPU optimization is all about minimizing the amount of time that the CPU sits around waiting for RAM reads/writes.

Unrelated to the bitfield discussion, isn't the code in the original post technically venturing into undefined behavior land (although supported/working in most compilers)?

It writes to 1 data member of the union (char c) and then accesses a different data member (bits b).

Hello to all my stalkers.


isn't the code in the original post technically venturing into undefined behavior land

Unions is in list of allowed types for aliasing. C++ Standard section 3.10 item 10:

10 If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined: 52

....

— an aggregate or union type that includes one of the aforementioned types among its elements or non-
static data members (including, recursively, an element or non-static data member of a subaggregate
or contained union),

....

So there's no problems with undefined behavior.

isn't the code in the original post technically venturing into undefined behavior land (although supported/working in most compilers)?
It writes to 1 data member of the union (char c) and then accesses a different data member (bits b).

Yes, technically you're only allowed to read from the same union member which was previously written. However, (almost) every C/C++ compiler actually recommends using unions in this exact way when you intend to perform a bitwise reinterpretation of a value. So in the real world, it's actually recommended / good style. Compilers will recognize the pattern and correctly deal with the aliasing issues.

You need to be a bit careful when using bitfields, as there are some significant portability issues. Some of them affect correctness, and others just affect how much space saving you get.

  • For the example in the OP sizeof(character) may vary depending on which compiler you've used (I'd expect either sizeof(unsigned) or sizeof(char)). If it ends up as sizeof(unsigned), then endianness will cause portability problems if you write to one member then read from the other.
  • The order of the bits within a bitfield isn't well defined either, so writing to b.b0 and then reading c isn't portable even if you avoid the size/endianness issue.
  • Various other stuff also isn't defined. For example there's several different numbers this code could output: struct test {int p:4; unsigned int q:4;}; printf("%d\n", sizeof(test));

One thing you should definitely avoid for cross platform portability is serializing anything that contains a bitfield. It's even worse than writing out a whole struct. Use masks and shifts instead if you need to be space efficient.

C99 6.7.2.1-11:An implementation may allocate any addressable storage unit large enough to hold a bit- field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

isn't well defined either

It is one of those borderline edge cases of the language standard.

There are several classes of behavior:

There is well-defined behavior that is specified in the standard. These things MUST be a particular way. Most of the language standard falls in this arena.

There is implementation defined behavior that is outlined by the standard but the compiler or system sets the value. These are usually properties of the underlying machine or compiler such as the number of bits in a data type, the maximum number of characters in a name, the depth of nesting that is legal, and so on.

There is unspecified behavior that is generally outlined by the standard. Usually the standard defines several options and allows the system to do whatever they want inside those bounds. For example, the order of evaluation of function arguments is unspecified. The standard requires the process must happen, but the standard allows broad latitude to how and when it happens.

There is undefined behavior that the standard has no requirements for. The most classic of these is the result of following a null pointer or following a pointer into unknown locations. The behavior is completely undefined by the standard and the system could do anything, from crashing, to warning with error messages, to doing what the programmer thinks might happen.

Individual compilers are allowed to write their own extensions and define their own behavior. They must provide the values for implementation defined behavior, such as stating that the maximum identifier length is x bytes, or the allowable depth of template recursion, and so on.

In this case the standard defines that they must exist, but allows most of the underlying details to be unique to the compiler.

Bit fields have a lot of implementation defined behavior. The underlying type, the number of bits, the packing of the bits, the allocation within a class object, the alignment, the straddling of allocation units, all of these are implementation defined.


Yes, technically you're only allowed to read from the same union member which was previously written. However, (almost) every C/C++ compiler actually recommends using unions in this exact way when you intend to perform a bitwise reinterpretation of a value.

Yes, this is one of those obscure little semi-guarantees in the standard that ALMOST every compiler takes steps to ensure.

The standard offers a guarantee, but the guarantee is based on implementation defined behavior.

Under class.union section: Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (9.2), and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members; see 9.2. —end note
With a bit of a cross reference back to 9.2, a bit-field is compatible as long as both the bit-field and the other type are layout-compatible and have the same width.
So even though all the details are implementation defined, as long as the implementation definition for the types are layout-compatible with the same width, it is perfectly legal. These days ever compiler I know of makes certain the bit-field's implementation is compatible with the other integral types.
Back in the bad-old-days (and likely even today for a few obscure or severely-limited chips) there were chipsets that did not have operations for bit shifting. Since bit packing is very important for many systems the operations are usually included, often with a magical piece of hardware called a barrel shifter because it is so common and important to be fast. But some small number of systems did not include the functionality, so packing and unpacking bits was/is much more difficult. In these systems a c++ implementation may decide the effort is not worth the reward, but I'm not aware of any of these systems in use today.

You don't really see bitfields in desktop applications because there's no need to conserve memory.

Sure there is need to conserve memory. Pulling data from the main RAM is slow (relatively speaking). The more memory you can fit into cachelines without complex operations, the faster things can go.

This isn't to say you should always compress your memory down - like most computer programming, you need to learn when is the appropriate situation to do so. But I definitely wouldn't say you "don't need to" anymore than I'd say you "always need to". Both are incorrect (though can be hyperbolically used to make a point - if that's what you were doing, it went over my head tongue.png).

Like most programming, there's a balance:


"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donald Knuth

Many programmers know the "premature optimization is the root of all evil.", but the whole quote is more nuanced.

Sometimes conserving memory in RAM can "actually have a strong negative impact" not just with debugging and maintenance, but even performance-wise as well. Othertimes, it can dramatically improve performance by 10-15% or more, permitting you to make your game more complex in other areas. A 10% reduction of your frames' average runtime is a big deal.

Note, I'm talking about conserving memory in general, not bitfields in particular. Bitfields can sometimes be pretty slow, but they are useful in other areas.

This topic is closed to new replies.

Advertisement