• 12
• 14
• 13
• 10
• 11

# Unions (C/C++)...probably a stupid question...

This topic is 3237 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi everybody, I've come across some code in a book and I'm having a hard time understanding it because I don't get the first bit of it. It involves a union and a struct...I have never heard of these (except for the fact that struct is more of a C thing) and am having a hard time finding information about these that I can understand. Here's part of the code:
typedef union
{
double fp;
struct
{
unsigned long lo;
unsigned long hi;
};
} hack_structure;

hack_structure x;

it would be great if somebody could explain this step by step...the last bit looks like creating an object but I see no class here. The whole union thing is really confusing me as I don't know what it is used for..also those variables lo and hi are never defined anywhere in the program but used right away...is there anything special about them? Cheers, Chris

##### Share on other sites
A struct is almost exactly the same as a class, except that their members are pubilc by default (class members are private by default). It's equivalent to this:
	class NoName	{	public:		unsigned long lo;		unsigned long hi;	};

Unions are an obscure part of the language that says "I've got two variables, but I want you to store them in the same memory location. For example, if we make a union of an int and a char, then when we modify one, we're actually modifying the other as well.
union{	int i;	char c;} x;x.i = 42;x.c = 123;if( x.i != 42 )  printf("The union worked!");

Putting that all together, your code is make a variable of type 'double', but also putting a structure (consisting of two unsigned longs) in the same piece of memory.
If you modify x.fp you will be modifying x.lo and x.hi.

Doubles are usually 64 bits and unsigned longs are usually 32 bits. This is a horrible hack that allows you to read out the actual "bit data" that is used to represent the double.

##### Share on other sites
thanks I think I kinda understand...but I don't quite get all of it :D
If I set x.fp to 2 for example and print x.hi and x.lo I get some very large number and 0 (for x.lo)...what does this mean...I can't quite understand this yet...whatever I enter for x.fp ...x.lo is always 0, x.hi is always a very large number.

##### Share on other sites
Doubles are stored as floating point data. The upper 32 bits of a double will hold the sign bit, the exponent and part of the mantissa. The lower 32 bits will hold the rest of the mantissa.

Seeing that doubles are encoded in this way (which is very different to how integers are encoded), then when you read out the values of those unsigned ints (after setting a value in the double), then they're just going to look like random numbers...

The union doesn't provide conversion between the two encodings - it just provides raw binary access.

If you want to convert a double to an unsigned int you would just write:
double realNumber = 42;unsigned int integralNumber = (unsigned int)realNumber;

This union trick that you're trying to understand is used to manipulate the underlying binary representation of the double.
For example, if I wanted to flip the sign-bit of the double, I could write:
x.fp = 42;x.hi ^= 0x80000000;

##### Share on other sites
Unions are used quite a lot in os development. A very common union on 32 bit Windows is LARGE_INTEGER.

typedef union _LARGE_INTEGER {    struct { DWORD LowPart; LONG HighPart;};    struct { DWORD LowPart; LONG HighPart; } u;    __int64 QuadPart;} LARGE_INTEGER,*PLARGE_INTEGER;

Here's a union that I used as a hack once, with a little pseudo-demonstration. It takes advantage of the fact that on 32 bit Windows, a memory address has the same storage requirement as an unsigned long, 4 bytes. It takes 4 bytes to store an address and 4 bytes to store an unsigned long (4 bytes = 32 bits).

typedef union tagMappedAddress {    void          * pvoid;    unsigned long * pulong;    unsigned long   ulong;}MappedAddress, *PMappedAddress;

Unions are more common in C than in C++. Hence the void pointer in the example.

// signature of bogus function used for purposes of illustration
// void * __stdcall GetMemory();

// instance of the union variable

// assign a memory address to the void pointer member .pvoid of the union variable
ma_ppd.pvoid = GetMemory();

// to deal with that address as if it was an unsigned long, use the .ulong member of the union
unsigned long q = ma_ppd.ulong * 45;

// to deal with that address as if it was a pointer to unsigned long, use the .pulong member
// you can then deal with this member as if it was an array of unsigned long

unsigned long p = ma_ppd.pulong[45];

Unions allow a programmer to change the data type of the variable by changing which member of the union is used. Here I began with a void pointer, shifted to an unsigned long and then shifted to a pointer to unsigned long, using the same variable the entire time. The actual values stored in the variable do not change when a different member of the union is used. What changes is the data type used to interpret those values. The "hackiness" of this lies in the fact that you can use this approach to get around compiler warnings. For example, some compilers will complain if you try to perform pointer arithmetic in a complicated manner. You can get around that by working with the pointer as an unsigned long instead. No compiler will complain about arithmetic on an unsigned long. (Note: Pointer arithmetic is tricky, so tread carefully if you don't know what you're doing). Unions can sometimes lead to problems with data alignment, but that isn't a huge problem for user mode programs. It can be overcome with a pragma pack. If you don't know that that means, don't worry about it.

A union is an alternative to using a type cast. If you find yourself using the same type cast again and again, using a union instead might be a better way to go.

##### Share on other sites
Quote:
 Original post by HodgmanThis union trick that you're trying to understand is used to manipulate the underlying binary representation of the double.For example, if I wanted to flip the sign-bit of the double, I could write:x.fp = 42;x.hi ^= 0x80000000;

Hm so the sign would be the MSB? since 0x80000000 is 1000 0000 0000 0000 in binary right?
I know I have a lot to catch up on concerning floating point arithmetic but I'm willing to learn since I think it's very interesting...just need to find the right literature.
One more question, why are you using x.hi to "compare" to 0x80000000. Am I correct in this assumption: x.fp is a double..therefore has a size of 8 bytes (64 bits). A long has 4 bytes and since I have two long ints in the union they have the same memory adress as the double and kinda "split" it in half (in lo and hi)? (You can hit me if this is completely off).

##### Share on other sites
Quote:
 Original post by BrickInTheWallHm so the sign would be the MSB? since 0x80000000 is 1000 0000 0000 0000 in binary right?
Yes that wikipedia page that I linked to explains that the MSB is the sign bit, the next highest 11 bits are the exponent, and the rest is a fraction.
That said, it's best not to rely on information like this and do things a simpler way. The behaviour of unions, for example, can vary from compiler to compiler.

A better way to flip the sign bit of a double would be:
fp *= -1; (or fp = fp * -1;)
Quote:
 I know I have a lot to catch up on concerning floating point arithmetic but I'm willing to learn since I think it's very interesting...just need to find the right literature.
Many computer science books will cover this low-level stuff. Perhaps someone can make a recommendation?
Quote:
 One more question, why are you using x.hi to "compare" to 0x80000000.
I'm not using a "compare" operator, but an "xor" operator.
"a ^= b" is the same as "a = a ^ b". And "^" means XOR.
XOR'ing a bit with 1 actually flips it's value (1 bits become 0, 0 bits become 1). So seeing that 0x80000000 is the sign bit, when I xor it with the double, it's sign bit will be flipped.
Quote:
 Am I correct in this assumption: x.fp is a double..therefore has a size of 8 bytes (64 bits). A long has 4 bytes and since I have two long ints in the union they have the same memory adress as the double and kinda "split" it in half (in lo and hi)? (You can hit me if this is completely off).
Yes, that is exactly what this union does ;D

However, a long int might not always be 32bits - on some compilers it might be 64bits (or theoretically even 8, or 1000!) so code like this should be avoided if possible.

##### Share on other sites
Quote:
 Original post by HodgmanA better way to flip the sign bit of a double would be:fp *= -1; (or fp = fp * -1;)

I prefer
fp = -fp;