Performance of #Pragma Pack(1)

Started by
10 comments, last by Conner McCloud 18 years, 10 months ago
What are the performance costs for using #Pragma Pack(1) for accessing structures? Inaddition, I've read it pads every element for 4 bytes (I've also read it doesn't), what does it actually do? eg... is this correct?
struct strObject
{
char chrHi;
char _Padding1, _Padding2, _Padding3;

int intTest;

short intTesting;
short _Padding1;
}
My program accesses packed structures a half million times per frame (And runs at roughly 160FPS), could modifying my struct boost its performance?
Advertisement
Quote:Original post by Thevenin
[...]My program accesses packed structures a half million times per frame (And runs at roughly 160FPS), could modifying my struct boost its performance?
Why not make such trivial changes and then benchmark them to see what difference they make? Such profiling would be infinitely more accurate than any guesses we could make a bout your code.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quite frankly, I don't know how to go about declaring some structs packed, and others non-packed.

My guess would be...
#pragma pack (1)struct strHi{char blahblah;}#pragma packstruct strBye{char blahblah;}
No. Use it like this:

#pragma pack(show)#pragma pack(push)#pragma pack(1)#pragma pack(show)struct first{  char a;  int b;};#pragma pack(pop)#pragma pack(push)#pragma pack(8)#pragma pack(show)struct second{  char a;  int b;};#pragma pack(pop)


Regards,
Don't guess, look at the docs.

The docs are for MSVC. Structure padding and alignment requirements can vary from compiler to compiler and processor to processor. For MSVC the default is to align members to thier natural alignment (e.g. 2-byte variables to 2-byte alignment, 4-byte to 4-byte, etc) unless told otherwise.
-Mike
Quote:Original post by Thevenin
What are the performance costs for using #Pragma Pack(1) for accessing structures?

Inaddition, I've read it pads every element for 4 bytes (I've also read it doesn't), what does it actually do?

pack(1) causes structures to be aligned on 1 byte boundaries. So there's never any padding between elements. pack(2) aligns on 2 byte boundaries, so chars will have one byte after them. By default, it is probably aligned on 4-byte boundaries but you can't be sure of this. *edit: except by consulting your compiler documentation. The C and C++ standards don't specify anything.

The performance cost is that processors are designed to access memory on word bounaries. With a four byte word, you access 0x00 through 0x03 all at once, but not 0x01 through 0x04. So if you have an int that starts on 0x01, then it requires two reads and some shuffling to get all the data. With #pragma pack(1) in place this happens if you have a char followed by an int. So it slows down the program, but decreases the memory requirements [because without it a few bytes might be added between the two to ensure the int is easy to read].

It also helps with portability in some cases, but you don't ask about that.

As to your code, that is what the compiler may or may not do by default [except for the closing pad, that's unneccessary in all circumstances], or if you specified #pragma pack(4). If you do pack(1), there would be no padding.

CM
Quote:Original post by Conner McCloud
The performance cost is that processors are designed to access memory on word bounaries. With a four byte word, you access 0x00 through 0x03 all at once, but not 0x01 through 0x04. So if you have an int that starts on 0x01, then it requires two reads and some shuffling to get all the data. With #pragma pack(1) in place this happens if you have a char followed by an int. So it slows down the program, but decreases the memory requirements [because without it a few bytes might be added between the two to ensure the int is easy to read].


So a packed structure of say..
struct strRGB{unsigned char bytBlue,bytGreen,bytRed;}


In the memory it looks like this.. (Packed)

[g][r] | [g][r][g] | [r][g][r] |

... reading the first RGB is going to be fast since all the data is already word aligned, however, in reading the second and third RGB's it is going to have to read from the first and second words, right?

And in the case of padding...
[g][r][_] | [g][r][_] | [g][r][_] |

Reading each structure takes reading only one word?

Edit: Or in padding, is it aligned like this..?
[_][_][_] | [g][_][_][_] | [r][_][_][_] |
When padding, it adds all the free bytes at the end of sequence. (I don't know how to properly express myself)

struct rgb
{
char r;
char g;
char b;
}

would be padded as

struct rgb
{
char r;
char g;
char b;
char _not_used_;
}

and it is read as b g r _ | b g r _ | b g r _ etc...
Quote:Original post by Thevenin
So a packed structure of say..
*** Source Snippet Removed ***

In the memory it looks like this.. (Packed)

[g][r] | [g][r][g] | [r][g][r] |

... reading the first RGB is going to be fast since all the data is already word aligned, however, in reading the second and third RGB's it is going to have to read from the first and second words, right?

Well, first of all, let me take back something I said before. I suggested padding at the end would never be necessary, my logic being that when you declare two structures, padding can be added after the first if needed, rather than making it a part of the first. But in the case of an array I don't think they can add padding between the elements of the array. So it may be necessary in that sense. I just don't know, so ignore that comment.

With that out of the way, it really depends. I believe single bytes can be read just as efficiently from anywhere, so there is probably no reason to pad them. But I might be wrong...that's really up to the processor. Which is why we have optimizing compilers. Microsoft pays engineers a lot of money to figure out how the various processors like to read memory, specifically so that we don't have to.

If you really want to know what your compiler is doing, then just take the address of the members and see how far apart they are.

CM
An array of your strRGB will always be layed out as

[g][r] | [g][r][g] | [r][g][r] 


regardless of #pragma (again, this is MSVC). You can write a really trivial test program to verify that for yourself. The whole structure has a natural alignment of 1 because all of it's members have a natural alignment of 1.

Quote:... reading the first RGB is going to be fast since all the data is already word aligned, however, in reading the second and third RGB's it is going to have to read from the first and second words, right?

Maybe. It depends whether the compiler is smart enough to know for sure that the first read would always be aligned properly or not. If it can't deduce that then it has to assume the worst and do unaligned reads all the time.

Quote:The performance cost is that processors are designed to access memory on word bounaries. With a four byte word, you access 0x00 through 0x03 all at once, but not 0x01 through 0x04. So if you have an int that starts on 0x01, then it requires two reads and some shuffling to get all the data. With #pragma pack(1) in place this happens if you have a char followed by an int. So it slows down the program

Or even more fun, on RISC processors your app will crash if you try to do an unaligned read.
-Mike

This topic is closed to new replies.

Advertisement