Sign in to follow this  
Arek the Absolute

sizeof confusion

Recommended Posts

While writing an image parser I came across this little oddity that I'm a complete loss to explain. It's simple enough to write a workaround, but I'd prefer to do this in the most elegant way possible. The problem is simple: sizeof is not returning what I'd expect it to. For instance, I can have a structure with two unsigned shorts and an unsigned char, and sizeof will report 6 instead of the expected five. In the process of examining this, I came across this even stranger example:
#include <iostream>
using namespace std;

class One {
	unsigned short one;
	unsigned char two;
	unsigned char three;
};

class Two {
	unsigned char one;
	unsigned short two;
	unsigned char three;
};

int main(){
	int sizeOne = sizeof(One);
	int sizeTwo = sizeof(Two);
	cout << "sizeOne: " << sizeOne << endl;
	cout << "sizeTwo: " << sizeTwo << endl;
	return 0;
}



yields the output:
sizeOne: 4
sizeTwo: 6
Apparently not only will sizeof indicate an extra byte in some structures, apparently the order in which these fields are listed changes the structure's size as well. This occurs both in Visual C++ 6.0, as well as in g++ 4.0. Can anyone explain why this happens, and perhaps what can be done to produce the expected results?

Share this post


Link to post
Share on other sites
Conner McCloud    1135
To elaborate on stylin's answer, the processor can't just read any random byte. It reads memory blocks at a time. If your object spans two such blocks, two reads are required. To avoid this, the compiler will pad structures so that each element falls on the proper alignment. A four byte int, then, will always be placed at an address that is a multiple of four.

Most compilers will provide some way of manipulating the padding if needed. Under VC++ it is #pragma pack, but I'm not sure about gcc.


*edit:
Graphical example

[0][1][2][3]-[4][5][6][7]-[8][9]
[c][ int ]

[0][1][2][3]-[4][5][6][7]-[8][9]
[c][padding] [ int ]

[0][1][2][3]-[4][5][6][7]-[8][9]
[ int ] [c]

In the first sample, the int is in both the first and second block, so two reads would be needed. In the second one, an extra three bytes are added to move the int int address [4], which allows it to only take one read. In the third example, no padding is neccessary, because a single byte char doesn't care where it is placed, and the int is already at a good location.

CM

Share this post


Link to post
Share on other sites
Brother Bob    10344
Quote:
Original post by Conner McCloud
Graphical example

[0][1][2][3]-[4][5][6][7]-[8][9]
[c][ int ]

[0][1][2][3]-[4][5][6][7]-[8][9]
[c][padding] [ int ]

[0][1][2][3]-[4][5][6][7]-[8][9]
[ int ] [c]

In the first sample, the int is in both the first and second block, so two reads would be needed. In the second one, an extra three bytes are added to move the int int address [4], which allows it to only take one read. In the third example, no padding is neccessary, because a single byte char doesn't care where it is placed, and the int is already at a good location.

CM

First two I agree with, but not the third. MSVC 7.1 pads the third structure to 8 bytes aswell, just like in the second case. Reason? Put two or more such structures in an array, and the int in the second array position is not aligned properly anymore.

Padding is not only done to ensure that members are on correct aligned offsets within the structure, but also when stored in arrays.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this