packing anonymous structs within a named struct and size guarantees [c++]

Started by
6 comments, last by the_edd 12 years, 2 months ago
Hi,

If I have a design like:

struct BaseData {
int x, y, z;
} packed;


struct DataTypeA : BaseData {
struct {
int a, b, c;
} packed;

other types;
more other types;
};

DataTypeA varA;


I'm using this to write stuff to files and what not. Each data type has common header information (BaseData) and then it's own information that needs to be written to file (ie. DataTypeA needs int a, b, c).

In the above code, is the first sizeof(int) * 6 bytes of varA, which is of type DataTypeA, guaranteed to have the values of variables x, y, z and a, b, c (in that order)?

Thanks!
[size=2]aliak.net
Advertisement
It is my understanding that the following things hold about the layout of varA:

  • In terms of address, x < y < z < a < b < c, in any instance of DataTypeA.
  • There is no padding before x in either BaseData or DataTypeA i.e. (char *)&varA == (char *)&varA.x.
  • There may be padding between any of the (transitive) members of varA, so you aren't guaranteed that the first sizeof(int)*6 bytes contain x, y, z, a, b, c.

However, in the reality I don't think any compiler is going to insert padding between x, y, z, a, b or c. It wouldn't take much to rearrange your code in order to be able to check this with a static assertion, or use your compiler's equivalent of "#pragma pack" (both, ideally).

However #2, is it really so bad to fwrite()/ostream::write() each member individually (perhaps via a simplifying wrapper)?
Ah, so there may be padding between members z and a? Or did I misunderstand transitive member?

Yeah I am using the compiler's packing attribute. But I want to avoid using it on the entire DataTypeA struct and just use it on the anonymous struct within DataTypeA so that those members are laid out contiguously in memory with the members of BaseData (since that's pragma packed as well). I could also have a packed BaseData followed by a named packed struct DataAHeader (which derives from BaseData) and then followed by struct DataTypeA which contains random types and derived from DataAHeader. But then if transitive members are an issue, maybe not.

Re Howerver #2: Writing each members out individually is probably a no go. The target is flash storage and for every write, the disk controllers read a fixed chunk of file space, erase a chunk and then write out the same amount (lets say 4k). So if I wrote member a, then I'd incur a 4k read and a subsequent 4k write, even though member a is 4 bytes. On the other hand there's probably some internal buffering going on, but that's quite unpredictable. And on top of that, you usually wear out flash storage within 10000 writes, so the less and more controlled the better.

It is certainly an option to serialize the members in smaller chunks to a 4k buffer myself and then write that to file though. I guess multiple memcpys to my own buffer in memory wouldn't be that much of a hit? I'm not too sure. But also if the padding stuff is guaranteed then might as well use it no.
[size=2]aliak.net

Ah, so there may be padding between members z and a?

In theory, that's correct.


Re Howerver #2: Writing each members out individually is probably a no go. The target is flash storage and for every write, the disk controllers read a fixed chunk of file space, erase a chunk and then write out the same amount (lets say 4k).
[/quote]
Writing via an ostream, or FILE* uses a buffered strategy by default. You can even set the buffer size to 4k in either case if you want. To customise the buffer strategy of std::ostream, create your own derived std::streambuf. For FILE*, use setvbuf().

Writing via an ostream, or FILE* uses a buffered strategy by default. You can even set the buffer size to 4k in either case if you want. To customise the buffer strategy of std::ostream, create your own derived std::streambuf. For FILE*, use setvbuf().


That will just flush the buffers out to the OS, the kernel does it's own buffering as well which can only be flushed with fsync AFAIK.
It does prevent less than X bytes of data being written to disk though, at least...
[size=2]aliak.net

That will just flush the buffers out to the OS, the kernel does it's own buffering as well which can only be flushed with fsync AFAIK.
It does prevent less than X bytes of data being written to disk though, at least...


In which case, the points about whether or not fwrite()/ostream::write() should be used on an element-by-element basis, and the subtleties of padding are somewhat moot(?)
Well, if you do this

write(elem1) // maybe OS fsyncs maybe not
write(elem2) // maybe OS fsyncs maybe not
...
write(elemN) // maybe OS fsyncs maybe not
fsync();

// as opposed to
write(entire data block) // maybe OS fsyncs maybe not
fsync();


I don't think that's moot since IO is quite expensive.

But like I said, maybe memcpy'ing into my own buffer before writing would be a nice middle ground if padding is not guaranteed (or change the structure)
[size=2]aliak.net

Well, if you do this

write(elem1) // maybe OS fsyncs maybe not
write(elem2) // maybe OS fsyncs maybe not
...
write(elemN) // maybe OS fsyncs maybe not
fsync();

// as opposed to
write(entire data block) // maybe OS fsyncs maybe not
fsync();


I don't think that's moot since IO is quite expensive.

Ok, IO is expensive, but this has nothing to do with padding any more(?). Do you have another question about buffering or are you all set?

This topic is closed to new replies.

Advertisement