boost::shared_ptr Performance

Started by
6 comments, last by loufoque 15 years, 5 months ago
Hi, all. I'm wondering about the performance of boost::shared_ptr. I've already run into interesting issues that pop up with shared_ptr, but at the moment I'm concerned with its runtime performance. I know there can be significant overhead for the usage of shared_ptr, but I want to know precisely what the overhead is and (more importantly) when does it actually occur. I have a suspicion that the majority of the overhead comes into play when a shared_ptr is copied. Is this correct? Copying would require the shared_ptr to alter its reference counting mechanics. What other operations can I expect to be slow going? (I sincerely hope that dereferencing a shared_ptr does not incur a great deal of overhead...) I ask because I want to make extensive use of shared_ptr, but I want to better understand what I can do to avoid slow code and where I'm probably just better off using built-in pointers and managing them myself. Thanks!
Advertisement
Quote:
I ask because I want to make extensive use of shared_ptr, but I want to better understand what I can do to avoid slow code and where I'm probably just better off using built-in pointers and managing them myself.

Thanks!

Unless you have an extremely, extremely (and I do mean extremely) specific usage scenario you can optimize to, you will likely not write code that is any faster than what shared_ptr does for managing the pointers yourself. As such the performance concerns should effectively wash. That said...

Quote:
Copying would require the shared_ptr to alter its reference counting mechanics.

Yes, but this is effectively a dereference of a common memory location (so perhaps a cache miss) and an integer increment. Not worth worrying about.

Quote:
(I sincerely hope that dereferencing a shared_ptr does not incur a great deal of overhead...)

It should not.

Quote:
Hi, all. I'm wondering about the performance of boost::shared_ptr. I've already run into interesting issues that pop up with shared_ptr, but at the moment I'm concerned with its runtime performance.

In short, to reiterate what I said above, you're likely only to beat the theoretical performance of shared_ptr in extremely specific scenarios; managing pointers in the general case will just cause you to write slower code, buggier code, or both. Use shared_ptr without regard to the performance implications until it becomes apparent that there is some major slowdown caused by the object.

The thing you want to be wary of with shared_ptr is cycles of self-referential pointers, because they will not be collected automatically until you break a link in the cycle. This is a semantics issue, not (directly) a performance one.
Quote:Original post by GenuineXP(I sincerely hope that dereferencing a shared_ptr does not incur a great deal of overhead...)

It costs absolutely zero more than dereferencing a raw pointer in terms of number of instructions executed. A small extra cost may occur in practice though because the shared_ptr has a slightly larger memory footprint than a raw pointer, thus increasing the likelihood of cache misses, but that's not part of the actual dereferencing but the step before it (that of reading the address pointed to before dereferencing).
Quote:Original post by jpetrie
Unless you have an extremely, extremely (and I do mean extremely) specific usage scenario you can optimize to, you will likely not write code that is any faster than what shared_ptr does for managing the pointers yourself. As such the performance concerns should effectively wash.
That's quite a strong statement. It's quite easy to forget the costs involved and put millions or these in some container, even if a simpler scheme or cheaper reference counting method would do.
I'd argue that boost::shared_ptr is overly generic, designed for ease of use rather than peformance. Shared pointers provide support for weak references and thread-safety, an dynamically allocate intermediate objects in order to be non-intrusive. In contrast manual reference counting or a straightforward use of intrusive_ptr would be sufficient for many, if not most, cases where shared pointers are currently being used and are about as lightweight as you can get.
Quote:
That's quite a strong statement. I'd argue that boost::shared_ptr is overly generic, designed for ease of use rather than peformance.
Shared pointers provide support for weak references and thread-safety, an dynamically allocate intermediate objects in order to be non-intrusive. In contrast manual reference counting or a straightforward use of intrusive_ptr would be sufficient for many, if not most, cases where shared pointers are currently being used and are about as lightweight as you can get.

I wouldn't disagree with that, necessarily (although I would take the OP's comment concerning manually managing pointers to mean raw pointers or something entirely hand-rolled rather than intrusive_ptr). However, I don't believe that invalidates the assertion that any such performance implications at this level are way to small scale to worry about before a profiler reveals them to be sucking up a statistically relevant portion of the application's run time.

Until such a time arrives, I would argue that it's far more practical and productive to concentrate on higher-level algorithmic optimizations or indeed to eschew "optimization" altogether and focus on writing code that gets the task at hand done with a minimum of supporting infrastructure muckery -- that is, essentially, to use shared_ptr because it will let you write your application and worry about performance later, if it becomes an issue.

As you noted, shared_ptr performs dynamic intermediate allocations (the nature of which are visible in the implementation if you care to dig around the headers) and handles thread safety and some other internal bookkeeping. These do incur minor overhead, but they incur it basically all the time, and since it's a small constant-time overhead it's not something I would be worried about until later.

(And additionally, I've seen far too many over-eager performance-minded developers relatively inexperienced in the domain of pointer-wrappers try and hand-roll their own stuff and absolutely destroy the sanity and ironically, performance, of their code; that is, however, not really the issue here.)
Quote:(And additionally, I've seen far too many over-eager performance-minded developers relatively inexperienced in the domain of pointer-wrappers try and hand-roll their own stuff and absolutely destroy the sanity and ironically, performance, of their code; that is, however, not really the issue here.)


While agreed on this, the design characteristics of shared_ptr do deserve to be mentioned here.

Quote:I have a suspicion


Skipping the obvious (boost is open source)...

shared_ptr has the following characteristics:
- reference count is heap allocated
- reference counting is thread-safe via interlocked operations
- passing the pointer by value modifies reference count
- dereferencing is trivial operation

Do you have a need to pass shared_ptr between threads by value?
Do you require unobtrusive reference counting?

Quote:but I want to better understand what I can do to avoid slow code


The cost of dereferencing basic pointer wrapper is same as direct pointer access.
Reference count adds an extra (typically sizeof(void*)) counter.
Passing smart pointers by reference has no additional cost.
Almost without exception entire overhead will come from replicating the smart pointer. In that case, the mandatory penalty is one in/de-crement. Additional penalty is subject to implementation. This many imply an extra cache miss.
Actual performance impact however is best determined by benchmark. It will be non-zero, but if smart pointers are used at adequately coarse granularity, they will not show up in profiler.

If you can afford intrusive method, then non-thread-safe intrusive_ptr is a viable option with minimal overhead. One thing I would check (don't remember right now) is whether it uses atomic operations or locks. Last I remember it doesn't.

For performance sensitive operations, the mere cost of allocation in the first place will be prohibitive, so any kind of such management will not be an option in the first place.
Thanks for the input, everyone. It's very helpful.

I figured there wouldn't be too much overhead, but I've seen others ask similar questions and wasn't sure what to think. (Also, I know I could've peaked at the source, but I figured asking here would be much easier and correct than trying to interpret the code myself and come to my own inaccurate conclusions.)

I'll continue using boost::smart_ptr until I actually find a bottleneck (or some other reason not to use them).

Thanks again. More input is welcome.
An intrusive reference counter is likely to be more efficient.
A design where shared_ptr doesn't take pointers to already allocated objects but objects themselves or factories (like a container) would also allow to put a reference counter into the object non-intrusively.

[Edited by - loufoque on November 12, 2008 2:33:52 AM]

This topic is closed to new replies.

Advertisement