Sign in to follow this  
BrickInTheWall

Unions (C/C++)...probably a stupid question...

Recommended Posts

Hi everybody, I've come across some code in a book and I'm having a hard time understanding it because I don't get the first bit of it. It involves a union and a struct...I have never heard of these (except for the fact that struct is more of a C thing) and am having a hard time finding information about these that I can understand. Here's part of the code:
typedef union
{
	double fp;
	struct
	{
		unsigned long lo;
		unsigned long hi;
	};
} hack_structure;

hack_structure x;
it would be great if somebody could explain this step by step...the last bit looks like creating an object but I see no class here. The whole union thing is really confusing me as I don't know what it is used for..also those variables lo and hi are never defined anywhere in the program but used right away...is there anything special about them? Cheers, Chris

Share this post


Link to post
Share on other sites
A struct is almost exactly the same as a class, except that their members are pubilc by default (class members are private by default). It's equivalent to this:
	class NoName
{
public:
unsigned long lo;
unsigned long hi;
};




Unions are an obscure part of the language that says "I've got two variables, but I want you to store them in the same memory location. For example, if we make a union of an int and a char, then when we modify one, we're actually modifying the other as well.
union
{
int i;
char c;
} x;
x.i = 42;
x.c = 123;
if( x.i != 42 )
printf("The union worked!");




Putting that all together, your code is make a variable of type 'double', but also putting a structure (consisting of two unsigned longs) in the same piece of memory.
If you modify x.fp you will be modifying x.lo and x.hi.

Doubles are usually 64 bits and unsigned longs are usually 32 bits. This is a horrible hack that allows you to read out the actual "bit data" that is used to represent the double.

Share this post


Link to post
Share on other sites
thanks I think I kinda understand...but I don't quite get all of it :D
If I set x.fp to 2 for example and print x.hi and x.lo I get some very large number and 0 (for x.lo)...what does this mean...I can't quite understand this yet...whatever I enter for x.fp ...x.lo is always 0, x.hi is always a very large number.

Share this post


Link to post
Share on other sites
Doubles are stored as floating point data. The upper 32 bits of a double will hold the sign bit, the exponent and part of the mantissa. The lower 32 bits will hold the rest of the mantissa.

Seeing that doubles are encoded in this way (which is very different to how integers are encoded), then when you read out the values of those unsigned ints (after setting a value in the double), then they're just going to look like random numbers...

The union doesn't provide conversion between the two encodings - it just provides raw binary access.

If you want to convert a double to an unsigned int you would just write:
double realNumber = 42;
unsigned int integralNumber = (unsigned int)realNumber;


This union trick that you're trying to understand is used to manipulate the underlying binary representation of the double.
For example, if I wanted to flip the sign-bit of the double, I could write:
x.fp = 42;
x.hi ^= 0x80000000;

Share this post


Link to post
Share on other sites
Unions are used quite a lot in os development. A very common union on 32 bit Windows is LARGE_INTEGER.


typedef union _LARGE_INTEGER {
struct { DWORD LowPart; LONG HighPart;};
struct { DWORD LowPart; LONG HighPart; } u;
__int64 QuadPart;
} LARGE_INTEGER,*PLARGE_INTEGER;


Here's a union that I used as a hack once, with a little pseudo-demonstration. It takes advantage of the fact that on 32 bit Windows, a memory address has the same storage requirement as an unsigned long, 4 bytes. It takes 4 bytes to store an address and 4 bytes to store an unsigned long (4 bytes = 32 bits).


typedef union tagMappedAddress {
void * pvoid;
unsigned long * pulong;
unsigned long ulong;
}
MappedAddress, *PMappedAddress;


Unions are more common in C than in C++. Hence the void pointer in the example.

// signature of bogus function used for purposes of illustration
// void * __stdcall GetMemory();

// instance of the union variable
MappedAddress ma_ppd;

// assign a memory address to the void pointer member .pvoid of the union variable
ma_ppd.pvoid = GetMemory();

// to deal with that address as if it was an unsigned long, use the .ulong member of the union
unsigned long q = ma_ppd.ulong * 45;

// to deal with that address as if it was a pointer to unsigned long, use the .pulong member
// you can then deal with this member as if it was an array of unsigned long

unsigned long p = ma_ppd.pulong[45];

Unions allow a programmer to change the data type of the variable by changing which member of the union is used. Here I began with a void pointer, shifted to an unsigned long and then shifted to a pointer to unsigned long, using the same variable the entire time. The actual values stored in the variable do not change when a different member of the union is used. What changes is the data type used to interpret those values. The "hackiness" of this lies in the fact that you can use this approach to get around compiler warnings. For example, some compilers will complain if you try to perform pointer arithmetic in a complicated manner. You can get around that by working with the pointer as an unsigned long instead. No compiler will complain about arithmetic on an unsigned long. (Note: Pointer arithmetic is tricky, so tread carefully if you don't know what you're doing). Unions can sometimes lead to problems with data alignment, but that isn't a huge problem for user mode programs. It can be overcome with a pragma pack. If you don't know that that means, don't worry about it.

A union is an alternative to using a type cast. If you find yourself using the same type cast again and again, using a union instead might be a better way to go.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
This union trick that you're trying to understand is used to manipulate the underlying binary representation of the double.
For example, if I wanted to flip the sign-bit of the double, I could write:
x.fp = 42;
x.hi ^= 0x80000000;


Hm so the sign would be the MSB? since 0x80000000 is 1000 0000 0000 0000 in binary right?
I know I have a lot to catch up on concerning floating point arithmetic but I'm willing to learn since I think it's very interesting...just need to find the right literature.
One more question, why are you using x.hi to "compare" to 0x80000000. Am I correct in this assumption: x.fp is a double..therefore has a size of 8 bytes (64 bits). A long has 4 bytes and since I have two long ints in the union they have the same memory adress as the double and kinda "split" it in half (in lo and hi)? (You can hit me if this is completely off).

Share this post


Link to post
Share on other sites
Quote:
Original post by BrickInTheWall
Hm so the sign would be the MSB? since 0x80000000 is 1000 0000 0000 0000 in binary right?
Yes that wikipedia page that I linked to explains that the MSB is the sign bit, the next highest 11 bits are the exponent, and the rest is a fraction.
That said, it's best not to rely on information like this and do things a simpler way. The behaviour of unions, for example, can vary from compiler to compiler.

A better way to flip the sign bit of a double would be:
fp *= -1; (or fp = fp * -1;)
Quote:
I know I have a lot to catch up on concerning floating point arithmetic but I'm willing to learn since I think it's very interesting...just need to find the right literature.
Many computer science books will cover this low-level stuff. Perhaps someone can make a recommendation?
Quote:
One more question, why are you using x.hi to "compare" to 0x80000000.
I'm not using a "compare" operator, but an "xor" operator.
"a ^= b" is the same as "a = a ^ b". And "^" means XOR.
XOR'ing a bit with 1 actually flips it's value (1 bits become 0, 0 bits become 1). So seeing that 0x80000000 is the sign bit, when I xor it with the double, it's sign bit will be flipped.
Quote:
Am I correct in this assumption: x.fp is a double..therefore has a size of 8 bytes (64 bits). A long has 4 bytes and since I have two long ints in the union they have the same memory adress as the double and kinda "split" it in half (in lo and hi)? (You can hit me if this is completely off).
Yes, that is exactly what this union does ;D

However, a long int might not always be 32bits - on some compilers it might be 64bits (or theoretically even 8, or 1000!) so code like this should be avoided if possible.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
A better way to flip the sign bit of a double would be:
fp *= -1; (or fp = fp * -1;)

I prefer

fp = -fp;

Share this post


Link to post
Share on other sites
Quote:
Original post by BrickInTheWall
thanks I think I kinda understand...but I don't quite get all of it :D
If I set x.fp to 2 for example and print x.hi and x.lo I get some very large number and 0 (for x.lo)...what does this mean...I can't quite understand this yet...whatever I enter for x.fp ...x.lo is always 0, x.hi is always a very large number.


It is undefined behavior to access a type in a union if it wasn't the last one written. Unions allow you to store multiple types in the same memory location. This makes for a cheap "any" datatype. As an example
class keyboard_event{
public:
int type;
int char_pressed;
};

class mouse_event{
int type;
int x,y;
}

union event{
keyboard_event;
mouse_event;
};

event getLastEvent();

I get a "polymorphic" return type since I can return either a keyboard or mouse event with out having to do a bunch of manual casting. I don't have to make my events inherit from some base class just to make casting work. I use the type variable present in each event to figure out which type of event it was.

Share this post


Link to post
Share on other sites
Alrighty, I'm getting the hang of unions but and understand what purpose they serve in the code I'm looking at, but my concentration has shifted towars something different now. Here is the code I am looking at:


#include <iostream>
#include <math.h>

using namespace std;

typedef union
{
double fp;
struct
{
unsigned long lo;
unsigned long hi;
};
} hack_structure;

hack_structure x;

int main( void )
{
short expo;
unsigned short sign;

while( 1 )
{
// Get an fp number
cin >> x.fp;

// Grab the sign and make it positive
sign = ( (long)x.hi < 0 );
x.hi &= 0x7fffffff;

// Grab the exponent
expo = (short)( x.hi >> 20 );
expo -= 0x3fe;

// Normalize the number
x.hi &= 0x000fffff;
x.hi += 0x3fe00000;

// Get square root of normalized number
x.fp = sqrt( x.fp );

// Force the exponent to be even
if( expo & 1 )
{
x.fp *= factor;
++expo;
}

// Halve the exponent
if( expo < 0 )
{
expo = expo / 2;
}
else
{
expo = ( expo + 1 ) / 2;
}

// Put it back
expo += 0x3fe;
x.hi &= 0x000fffff;
x.hi += ( (unsigned long)expo << 20 );

// Show the new number
cout << "The sqrt is: " << x.fp << endl;

return 0;
}
}


From my observation this program takes some value and halves it's exponent...for whatever reason, but thats not really relevant here. I don't quite understand for example, what the line expo -= 0x3fe is supposed to do. I don't really understand how the next step normalizes the number....I know what this means, it's just the way the coder does is makes my head explode.
Sorry for bringing this topic up again, but I didn't see the need to start a new one.
Can anyone give me any tips / advice to my questions?
Cheers,
Chris

Share this post


Link to post
Share on other sites
Quote:
Original post by BrickInTheWall
I don't quite understand for example, what the line expo -= 0x3fe is supposed to do.


'x -= y' is equivalent to 'x = x - y'.

0x3fe is an integer literal written in hexadecimal, i.e. the value 3FE in base 16, which equals 1022.

Quote:
I don't really understand how the next step normalizes the number....I know what this means, it's just the way the coder does is makes my head explode.


'x.hi' is a 32-bit value. The first step performs a bitwise AND which clears the 12 most significant bits (because they're AND-ed with 0). The next step adds in a value which sets those 12 bits to 3FE. Basically, the exponent field is replaced with bits which represent an exponent of 0 (because the exponent value is offset by 1022).

Share this post


Link to post
Share on other sites
ok so there's a biased exponent...so effectively, the code gets the square root of the mantissa in the line sqrt( x.fp ) and then halves the exponent...I think I've got it now, thanks :D

Share this post


Link to post
Share on other sites
Quote:
It is undefined behaviour to access a type in a union if it wasn't the last one written.

Stonemetal partly correct and I do wish people would add disclaimers when saying you can do these sort of things in C++

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
This is a horrible hack that allows you to read out the actual "bit data" that is used to represent the double.


How exactly is this a horrible hack? It's not a hack at all, perfectly legal C code, I can only think of less efficient or less obvious methods to read the bit data.

Share this post


Link to post
Share on other sites
Quote:
Original post by Decrius
How exactly is this a horrible hack? It's not a hack at all, perfectly legal C code

No, it's not:
Quote:
Original post by stonemetal
It is undefined behavior to access a type in a union if it wasn't the last one written.

Share this post


Link to post
Share on other sites
In what way _could_ it be undefined behaviour? Does it suddenly changes the data if you let it believe it's a different type? What can go wrong and makes it undefined behaviour?

Share this post


Link to post
Share on other sites
union {
int i;
char x[sizeof(int)];
};


This is apparently standard compliant with regard to layout specification. x can be used to read the bytes (assuming char is expected 8-bit value). It applies to individual bytes - not other types.

Quote:
In what way _could_ it be undefined behaviour?

Conversions between different types may differ between compilers and platforms - standard does not define the result.

Even in above case, x returns valid bytes, but standard does not prescribe their meaning.

However, one doesn't need meaning to perform endian conversion, for example.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this