Performance & Struct sizes

Started by
7 comments, last by spek 11 years, 2 months ago
Hey,
I'm curious about some todo's and not todo's regarding looping through, and passing around structs (usually as function parameters). I'm programming in Delphi, but I guess the general rules apply on most languages.
* Optimal struct size
On a 32bit CPU, are there advised sizes (32,64, 128, ...)? And, do these double on a 64bit CPU? When dealing with vectors and structs, the size often grows quickly.
* Filling records
Lets say an optimal size would be 64 bytes, but I only have 60. Is it wise to add some dummy bytes to fill (not concerned about memory usage)?
I guess this is what Delphi does by default btw, unless you declare a "packed record".
* Splitting in multiple records
In case 64bytes would be an optimal size, but I need at least 100 bytes... Would it be smart to divide the data in 2 records, each 64 bytes each?
* Dunno how caching works exactly, but I guess there rules apply especially for looping through arrays structs. But, does it also apply when passing just a single struct to a function?
For example, I have a whole list of variables that can be adjusted by a callback function. So, what I do now is packing all those variables in a single struct, and pass the struct pointer to that callback. Don't know if I need to care about the size, or eventually splitting in multiple smaller structs here...
Greets,
Rick
Advertisement
I'm speaking from C++ experience, but I guess Delphi is similar.

There are not optimum struct sizes, but there are required alignments for data types. A 32 bit value will usually need to be aligned to a four byte boundary. This alignment affects the structure size, because it must be a multiple of the largest alignment inside the struct, and there may also be padding between members.

Usually, the only difference on a 64 bit CPU (if the program is compiled as 64 bit) is that pointers are larger. If a struct contains pointers, it will be a different size in a 64 bit environment.

Padding is added automatically by the compiler, and there is no need to split up structs. Especially if you are passing by pointer, when the data is not actually copied and the size doesn't matter.

All right, seems I'm a bit too concerned then hehe, other than the struct size should be dividable through 4 on a 32bit system, to put it simple. So, whether a struct would be 28 or 32 bytes wouldn't matter in that case.

Yet, I've read (a long time ago) that the programmer chose for a specific (small) struct size in complex algorithms such as pathfinding, so it could be easily "cached"... I know why things are cached, but forgot how it exactly worked.

Delphi is not padding all structs in my case, since I specifically tell those structs to be "packed" in some cases. Usually in cases where the struct is also stored in a binary file to keep things compact and easier to predict.

Thanks

If you are not using automatic padding, you should get a performance boost by keeping the size a multiple of four bytes, and putting members smaller than that at the end so they don't affect the alignment of the other members. The CPU will have to do two loads from memory to access a location that is not aligned.

Data layout is more important for efficient caching than exact struct size. You want to be storing data continuously in memory in the order that you access it. Smaller struct size means less data to cache, which may be faster, but not at the expense of proper alignment.

* Optimal struct size
On a 32bit CPU, are there advised sizes (32,64, 128, ...)? And, do these double on a 64bit CPU? When dealing with vectors and structs, the size often grows quickly.

Optimal struct size depends entirely on the nature of the memory access patterns and the entire memory system for the machine on which the algorithm is being executed. Your question here seems to imply that if you choose a certain size for the struct, you'll get good performance (or some other nice optimization by some other metric), when no such thing is guaranteed. What matters to the CPU is that you're always grabbing memory locations in an order that guarantees a high likelihood of touching something already in the cache. A small struct size can make that easier to achieve, but a small struct size can be worthless if you're grabbing array[0] then you grab array[1024].

* Filling records
Lets say an optimal size would be 64 bytes, but I only have 60. Is it wise to add some dummy bytes to fill (not concerned about memory usage)?

I guess this is what Delphi does by default btw, unless you declare a "packed record".

Let your compiler handle this. In fact, I would say don't worry about this at all unless you truly have some time critical piece of code. The only places I've seen that consistently had gains from something like this were acceleration structures for graphics algorithms where you could literally be doing millions of queries. I remember stressing over bits to try to put a Kd-tree node into a cache line for a particular system, and in the end, it only netted me 10%-15% performance improvement. Don't get me wrong, that kind of performance improvement can be huge, but it's one of those last resort optimizations and all you're doing is trying to reduce the size of the constant in your asymptotic analysis.

* Splitting in multiple records
In case 64bytes would be an optimal size, but I need at least 100 bytes... Would it be smart to divide the data in 2 records, each 64 bytes each?

Again, this depends entirely on your memory access pattern and the nature of the machine you're running on. The only correct answer here is: "Maybe". If we were to assume that the algorithm you're executing touches a particular set of data more often than another, you may find gains by splitting your algorithm and data into one that relies on a hot data and cold data split. You put things you commonly access into the hot data struct and things you access less frequently into the cold data. The goal with this sort of strategy is to pack more of the hot data into cache lines so you can take advantage of the caching system. The rationale for the performance gain hinges on the fact that you should be accessing hot data much more frequently so it is very costly if you cache miss on this memory access, however, if you access cold data, you're likely to cache miss, but since you don't access this data very often it shouldn't have a huge negative impact on performance. It's using Amdahl's law to arrange memory.

* Dunno how caching works exactly, but I guess there rules apply especially for looping through arrays structs. But, does it also apply when passing just a single struct to a function?

The memory hierarchy for modern computer systems are both simple and very complicated. They are simple in the high level concept, but very complicated in the nitty gritty details. I would highly recommend learning up on computer architecture to get a better idea of how the hardware works in general. You can find some great amount of detail of memory and caches in http://www.akkadia.org/drepper/cpumemory.pdf. But, to answer your question, caching always applies. Every single data access you do will be affected by the cache hierarchy on modern CPUs (since just about every single CPU now has a cache). You have your instruction cache, your data cache, and nowadays, multiple levels of data cache (sometimes shared between cores, sometimes not). The CPU works pretty much exclusively with registers and the cache, not with main memory.

Whether or not these things matter to you in your particular situation is another question. I'm of the opinion that performance always matters, but you only have so much time in your life to implement your algorithms. The real issue is which parts of your program have performance implications that actually matter to your end user. Again, Amdahl's law. Use it, use it everywhere.

It can be advantageous to declare the members of your struct in order of decreasing size. This leads to the compiler inserting the least padding between members.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

You can find some great amount of detail of memory and caches in http://www.akkadia.org/drepper/cpumemory.pdf.

Thanks for the PDF.

++

This kind of strikes me as micro optimization, as has been said unless you're working with time critical code and feel you're passing structs and such that are actually causing a hangup you're probably gonna waste more time trying to tweak this then you will possibly net in performance gains.

Not working with critical code (except for pathfinding and such), but I find it comfortable to know some general easy-to-implement hints. Like some explained here, ordering the struct contents from big to small is very easy to do, so why not do it? Of course, tiny optimizations shouldn't obstruct the code readability. If the code gets messy I shouldn't make optimizations, unless really needed or at least in a later stadium when everything works stable already.

Well, my head is too beer-ed to process all the info right now, But thanks a lot for the hints, explanation and the PDF!

This topic is closed to new replies.

Advertisement