Critical size to pass by value

Started by
5 comments, last by Adam_42 10 years, 7 months ago
How big is a class or struct when it's cheaper to pass by reference than by value?

Large structs are cheaper pass by reference, but structs containing just an primitive type, or no objects at all are cheaper to pass by value. But what about types in between? what size is the cut off point? Is the number and mix of types a consideration?

This is essentially the same as asking "when should RVO kick in?"

There is some advantage to passing by value in alias analysis, and for my purposes I want to separate this effect. I'm curious what effect alias analysis has, but I'm also interested when the expense of the copy itself matters.

I'm pretty sure that for non trivially copyable types, passing by reference will always be cheaper, except for aliasing implications, but for POD types that's not so clear.
Advertisement

I'm pretty sure that for non trivially copyable types, passing by reference will always be cheaper, except for aliasing implications, but for POD types that's not so clear.

Yeah, for an example of a non-trivially copyable type, take a std::vector -- passing by value, it will allocate memory, memcpy stuff, call constructors, etc... Potentially a lot of instructions and memory operations!

For POD types, there's no definitive answer. It depends on the CPU architecture, the calling convention, and the number of other arguments.

On some CPUs there's code that will bring it to it's knees (mention "load hit store" to a PowerPC systems programmer and watch the colour drain from his face laugh.png), while other CPU's will breeze through the same code without a hitch.

Ideally, you'd like all your arguments to be present in registers so there's no memory operations required. If you've got 100x int32 arguments, then that's just as bad as one struct argument that has 100 int32 members tongue.png

So, saying that a struct that takes up 20 bytes shouldn't be passed by value, would be a bit silly if you've also got functions that take 5 int32's by value (also 20 bytes)!

A good way to investigate this yourself would be to do some experiments and look at the actual assembly that's produced from different types of code, like this guy:

http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/

http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/

http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/

[edit]

Also, pass by const-reference is a good default choice for "large" POD objects, because it's actual implementation is up to the compiler. The compiler can choose to pass by pointer or pass by value internally, based of it's own knowledge of the code and target CPU.

As a general rule, objects should be passed by const reference if you do not modify them, such as: foo( const Bar &b ) {...} or as a non-const reference if you do intend to modify them, such as: foo( Bar &b) {...}


Large structs are cheaper pass by reference, but structs containing just an primitive type, or no objects at all are cheaper to pass by value
I can easily disagree with the statement as it is stated as an absolute.

I can construct a data structure containing thousands of integers. Pass by value creates a copy, and the time it takes to copy thousands of integers is much greater than the time required to compute a memory address.

The exact size that establishes a cutoff is going to be a big "it depends" on context. It also isn't just the size of the object, but the steps involved. Exactly how the copy is made internally becomes important. Whatever that number is, it is going to be fairly small, perhaps in the 4-64 byte range. The size of a CPU cache line vs the size of the structure will be significant. Making copies of objects will also depend on cache usage and availability. What is involved in creating the copy? Is the function eligible for inlining, and does the compiler inline it? Does the compiler make a member-by-member copy, or a bulk memcopy, or do something else entirely?

Ultimately this is a micro-optimization.

Even if I saw hard numbers showing that a specific structure was faster to pass by value, I still probably wouldn't do it because of the mental overhead involved. It would need to be some absolutely critical bit of code that only optimization engineers could touch, and in practice those bits of code are vanishingly rare.

How big is a class or struct when it's cheaper to pass by reference than by value?

Large structs are cheaper pass by reference, but structs containing just an primitive type, or no objects at all are cheaper to pass by value. But what about types in between? what size is the cut off point? Is the number and mix of types a consideration?


Well, this is a touchy one and there are some clarifications to be made. First off, depending on the CPU and build settings (and various attributes/declspecs), this can vary quite a bit. Second, anymore, the compilers are very good at figuring out even cases such as this where you pass a const ref to a 4 byte value, the compiler can/will remove the reference and just pass by value in a register.

In general, there are lots of different ways this balances and you really just have to look at the generated assembly if you want to figure it out. My rule of thumb is pretty simple, if the item is larger than a single register on the CPU, 32bit for x86 for instance, I just go ahead and use the reference. The compilers "generally" come around the back and change the access method anyway if it is a waste.

This is essentially the same as asking "when should RVO kick in?"


There is actually no real correspondence between the two items. The generated code is considerably different between the two cases. RVO can kick in on very large pieces of data since all it really means is that the return value is already on the stack (or a ref/pointer to the return item) and instead of filling a temp to be copied when the function exits, the data is written directly to the already existing memory area.

There is some advantage to passing by value in alias analysis, and for my purposes I want to separate this effect. I'm curious what effect alias analysis has, but I'm also interested when the expense of the copy itself matters.

I'm pretty sure that for non trivially copyable types, passing by reference will always be cheaper, except for aliasing implications, but for POD types that's not so clear.


Aliasing should really not impact this problem too much unless you are doing a lot of oddball casts. I can't state this as fact, but the pass by pointer/reference only really changes things if you have the likelihood of passing the same pointer for various reasons.

I'm pretty sure that for non trivially copyable types, passing by reference will always be cheaper, except for aliasing implications, but for POD types that's not so clear.

Ideally, you'd like all your arguments to be present in registers so there's no memory operations required. If you've got 100x int32 arguments, then that's just as bad as one struct argument that has 100 int32 members tongue.png
So, saying that a struct that takes up 20 bytes shouldn't be passed by value, would be a bit silly if you've also got functions that take 5 int32's by value (also 20 bytes)!

I guess a pointer can always fit in a register, so if the struct cannot be put into registers, then it's probably slower. But how will a struct be packed into registers? For instance, if I pass a 4 char struct, will it use one 32 bit register or 4?

As a general rule, objects should be passed by const reference if you do not modify them, such as: foo( const Bar &b ) {...} or as a non-const reference if you do intend to modify them, such as: foo( Bar &b) {...}

I agree, with the added cravat that in C++11 you should pass by value if you need to make a copy anyway, or for interfaces, if the operation logically should need a copy.


Large structs are cheaper pass by reference, but structs containing just an primitive type, or no objects at all are cheaper to pass by value

I can easily disagree with the statement as it is stated as an absolute.

I can construct a data structure containing thousands of integers. Pass by value creates a copy, and the time it takes to copy thousands of integers is much greater than the time required to compute a memory address.

That's not in contradiction to what I said.


This is essentially the same as asking "when should RVO kick in?"


There is actually no real correspondence between the two items. The generated code is considerably different between the two cases. RVO can kick in on very large pieces of data since all it really means is that the return value is already on the stack (or a ref/pointer to the return item) and instead of filling a temp to be copied when the function exits, the data is written directly to the already existing memory area.

A compiler must ensure that it's possible to determine if RVO is usable only from a function prototype, so that linking to that function works. So I'd imagine that that calculation is mostly based on the return type -- classes which are not cheap to copy are passed by a hidden pointer parameter. That's essentially the same problem as deciding when classes are cheap to copy for passing in by value.


There is some advantage to passing by value in alias analysis, and for my purposes I want to separate this effect. I'm curious what effect alias analysis has, but I'm also interested when the expense of the copy itself matters.

I'm pretty sure that for non trivially copyable types, passing by reference will always be cheaper, except for aliasing implications, but for POD types that's not so clear.


Aliasing should really not impact this problem too much unless you are doing a lot of oddball casts. I can't state this as fact, but the pass by pointer/reference only really changes things if you have the likelihood of passing the same pointer for various reasons.

Chandler Carruth said otherwise in his Boostcon keynote. In short, things passed by value can't alias, things passed by reference cannot always be proven not to. Sometimes passing by rvalue reference may help.


I guess a pointer can always fit in a register, so if the struct cannot be put into registers, then it's probably slower. But how will a struct be packed into registers? For instance, if I pass a 4 char struct, will it use one 32 bit register or 4?

If you link against a library compiled with the same compiler then maybe that compiler family could provide such an optimization. There could be a large number of permutations a compiler writer would have to consider and so I'm guessing, more often than not, that standard calling conventions are used.

For example, here are some of the calling conventions for C on x86, for example: http://en.wikibooks.org/wiki/X86_Disassembly/Calling_Conventions

Even if you use _fastcall it probably won't pack unless you play some shenanigans.

On ARM compilers r0-r3 essentially do _fastcall. 64 bit types are passed on evenly aligned registers (for example passing a 64 bit value using r1,r2 breaks AACP conventions. For ARM calls limiting parameters to something that fits in r0-r3 allows for register to register only memory access. (r0 receives 32 bit return types and r0,r1 receives 64 bit types). Parameters that don't fit into r0-r3 are placed on the stack.


proto_A( int32, int64, int32 )
proto_B( int32, int32, int64 )
proto_C( int64, int32, int32 )

Of the above proto_B and proto_C are ideal. While proto_A will put int32 into r0, int64 into r2,r3, and the final int32 on the stack.

I'd think that most of the time you'll have one of these cases:

1. The copy constructor for the parameter is expensive. Avoid passing by value unless you want the copy to happen.

2. The function is likely to be inlined. That means standard parameter passing won't happen so it doesn't make much difference how you pass the parameters.

3. The function is unlikely to be inlined. This probably means the parameter passing isn't going to be a significant proportion of execution time compared to the function body. The programmers best guess will probably be good enough.

4. The function is small, but can't be inlined. If this is in performance critical code then the parameter passing method might be significant. Choose carefully and profile.

Note that you have to be very careful when profiling. If you're passing data to a function from memory that isn't in the cache then passing it by const reference will cause the cache miss to happen inside the function, but passing by value will make it happen outside the function in the calling code. Make sure you add up the costs for both the callee and the caller to get a useful answer.

This also means that if the function has a good chance of not using a parameter then passing it by reference can be more efficient regardless of what type it is, because it will miss the cache less.

This topic is closed to new replies.

Advertisement