Practicality of a C++ Garbage collector

Started by
15 comments, last by Melkon 8 years, 3 months ago

I love the idea of memory ownership, but I dislike how (in C/C++ at least) it breaks abstraction in the sense that you have to constantly be mindful of where things came from, and where they're going (not to mention how it just greatly complicates the language). While it doesn't feel out of place at a low level, if you never leave C++ it feels like you never actually escape it, even at the highest levels. There's Rust, which ingrains these things deeply enough that the compiler can catch errors for you, but I'm ruling it out until it's more mature.

The other side of the spectrum is Garbage collection, which I guess you could call "memory communism". There are third-party garbage collectors available for C++ (such as Hans-Boehm), but I was wondering if anyone here can provide anecdotes on using them in the real world. Should it be seen as a last resort? Can I depend on it entirely, even to the extent of making my collection classes use garbage-collected buffers?

Thanks

Advertisement

I love the idea of memory ownership, but I dislike how (in C/C++ at least) it breaks abstraction in the sense that you have to constantly be mindful of where things came from, and where they're going (not to mention how it just greatly complicates the language).


First off, I disagree with this statement. If you own a piece of memory, you use unique_ptr. If ownership must be shared, you use shared_ptr. Unless you're transferring ownership (or sharing ownership) you should use a pointer or reference.

Most of your interfaces then devolve into taking pointers or references because they only need to access data, not transfer or share ownership.

While it doesn't feel out of place at a low level, if you never leave C++ it feels like you never actually escape it, even at the highest levels. There's Rust, which ingrains these things deeply enough that the compiler can catch errors for you, but I'm ruling it out until it's more mature.

The other side of the spectrum is Garbage collection, which I guess you could call "memory communism". There are third-party garbage collectors available for C++ (such as Hans-Boehm), but I was wondering if anyone here can provide anecdotes on using them in the real world. Should it be seen as a last resort? Can I depend on it entirely, even to the extent of making my collection classes use garbage-collected buffers?

Thanks


Why do you want to use garbage collection? Because you don't want to handle ownership? That seems like a relatively weak reason as GCs tend to be expensive at the worst times. (Even the best GCs in the world can destroy performance if given poorly written code and memory patterns to work with)

GCs also mean you no longer have control over when things are destroyed which can be very detrimental to performance and overly complicate code. For example, if you have a 3d model, you now have to be able to tell it "you are dead, so don't update or render, even though you exist" because the GC only looks at memory and may not think it needs to destroy your model for a long time.

Finally - GCs only manage one thing: Memory. There are a lot of other resources that you will still have to manage by hand (files, sockets, OS handles, GPU resources) which the GC will not know anything about. Most languages that use GC have to then employ some sort of clunky work-around to add deterministic destruction for these resources (see: .NET's IDisposable system). So even if you use a GC you're still going to have to do a lot of resource management yourself - something that good C++ code with RAII will handle automatically.

The projects I work on have an opt-in GC which is used for certain types only because destruction of these types is expensive, and the GC is set up to only run a certain amount of time per frame, thus mitigating destruction cost across multiple frames. And we could probably eliminate our GC entirely given the time and resources to do an big ownership pass of our entire engine. (Yay legacy code)

In my view, ownership and garbage collection are not opposites of a spectrum, they are separate concepts.

Ownership is about who "owns" some memory, ie who controls access and modifying it.

Imho this should be done during the entire life time of all objects. I also believe this is done in any language, even in languages like Python, Java, and C#. If you don't do this, you are without clue how or when fields may change, which sounds to me like a good road towards long bug hunt sessions.

Memory release is only relevant at the end of the life time of an object. If you have enough memory, you don't even need a memory release policy. You still need to handle ownership though (thus showing memory management and object access are different things).

More practically, if you use explicit destruction (delete x), you get full control when the object releases its resources. Automagic garbage collection typically doesn't give you any guarantees about that. In addition, you get memory leaking as additional problem if you're not carefully dropping unneeded links after you're done with them.

I used the Hans-Boehm collector in a simulator as a replacement of a reference counting mechanism, which proved to be too error prone (in C, ie no destructor call to hook into). It worked great for about 7 years until the software was replaced, although I have no data on how much time was spent on collecting etc. The setup was extreme in other ways too (it did maximal sharing of all data), so numbers would probably not be of much use anyway in other situations.

Despite this success, I am not convinced it is generally useful. You handle ownership anyway during the life time of an object, which means you normally know exactly when you should destroy objects, so why not make that explicit then?

The main useful model that fails in this case is shared distributed read-only access. That is however easily resolved by using techniques like std::shared_ptr and friends to emulate your shared links. I programmed an expression evaluator with shared links for read-only objects, and it does work quite nicely; you never worry about release of the data. The model mostly works due to objects being read-only, so sharing them with anyone is not giving problems.

There is also std::unique_ptr and friends for your exclusive ownership links that move around. So far, I have not found a use for std::unique_ptr, although I am sure there exist valid use cases.

I agree with Alberth. I exclusively use C# (garbage collecting runtime) and it's still extremely important to use a clean object/data ownership scheme.
Keep in mind that GC is generally only helpful for _memory_ resources. C++'s ownership model works for _all_ resources (memory, file handles, locks, internal state, etc.)

Sean Middleditch – Game Systems Engineer – Join my team!

All in all, I think C++ got it exactly right here (C++ got a lot of stuff wrong, but the ownership model maps very well to the nature of most situations, and as stated before it's universal, not just for memory). Your statement of C++ requiring you to be mindful at all times is a good argument against garbage collection, by the way. If a language forces the programmer to carefully think about what's going on (and either does not allow, or harshly punishes failure to comply), that is a big plus. Garbage collection can be useful (I consider smart pointers a form of garbage collection, by the way!), but while having "someone else" care is surely more comfortable, it is not inherently better. In fact, it can seduce you into writing vastly inferior code (not necessarily, but it can). Your designated usage is an example of that.

Collection classes use garbage-collected buffers
The question is: Why would you want to use garbage collection to manage buffers used by collections (or containers, as they're termed in C++)? The only way to add or remove objects from that pool of objects is via the API provided by the container class. It is therefore exactly known at all times whether an object is valid or not, and when it is to be destroyed. The container can just do that (and can in most cases reuse/retain the memory at no extra cost), there is not need for garbage collection. There is no way you could remove an element without the container class knowing. Also, the word "buffer" suggests that you do not plan to allocate and aggregate single elements (such as in a typical list), but rather manage a whole large buffer containing many small objects (such as e.g. in a vector). That's an even more compelling case against garbage collection. The container, and the container alone, knows exactly when a buffer has reached the end of its life (e.g. when a vector is resized, or when all elements are removed). Why should it leak that buffer and rely on some garbage collector to (maybe, eventually) pick it up? You spend extra CPU time doing that garbage collection, and meanwhile you keep more memory allocated than necessary. That's lose-lose, not win-win.

There is also std::unique_ptr and friends for your exclusive ownership links that move around. So far, I have not found a use for std::unique_ptr, although I am sure there exist valid use cases.


std::unique_ptr should be what you always start with. Because it is far easier to have good ownership if only one person owns something. std::shared_ptr makes it very easy to get into the same problems GC has - indeterminate destruction, because you don't necessarily know if the object goes away when you release it. Some of it can be mitigated with good use of std::weak_ptr, but, again, you lose the "this is being destroyed now" guarantee that std::unique_ptr gives you.

std::unique_ptr is also better for both performance and memory because it doesn't need to adjust a reference count (potentially in a thread-safe manner) nor does it need to allocate an additional control block to keep track of said counter (yes, make_shared can do it with one allocation, but then weak_ptr can keep around the entire memory block because you can't half-delete something).

I generally prefer to use unique_ptr for owners, and raw pointers or references for access to it (making sure that said pointers and references never outlast the owner).

IIRC there is also a proposal for C++17 for a more generic "smart resource" that isn't quite so tied to being a pointer, so I'm looking forward to that (though you can easily roll your own today).
In game development (or other high performance computing) understanding memory ownership is crucial to writing efficient code.

For most other tasks, it simply gets in the way, and worse, forces you to think in the language domain instead of the problem domain.

But that's why we have different languages. ):
if you think programming is like sex, you probably haven't done much of either.-------------- - capn_midnight


So far, I have not found a use for std::unique_ptr, although I am sure there exist valid use cases.

I think one of the best things about std::unique_ptr is that it is not copyable.

This helps enforce the ownership you define at compile time, which makes it harder to make mistakes. (it even catches when you by mistake try to make a copy of your vector of pointers)

As SmkViper says, it should be the default pointer type used, with the express meaning "this is the owner of the object"

Second best thing is that always using std::unique_ptr to define ownership means you never have to write "delete" again, which also means you can use the default destructor more often, meaning less code to write. (and the code you don't have to write, never has errors)

I agree that a lot of the time unique_ptr isn't required, as (oftentimes) instead of this first struct, you can just write the second wink.png


struct Owner
{
  std::unique_ptr<Owned> widget;
  Owner() : widget(std::make_unique<Owned>(42)) {}
};

struct Owner
{
  Owned widget;
  Owner() : widget(42) {}
};

Also shared_ptr is just too bad to use IMHO. Don't get me wrong - it's a great idea, I just don't like the reality where an ungodly amount of near-invisible-to-the-reader atomic cache-line locking instructions are generated everywhere for no good reason. It's a completely overzealous attempt to provide "thread safety" when it's often not needed, and actually doesn't make the shared_ptr's themselves thread-safe at all, leading to confusion on that front... IMHO they dropped the ball there.

These days, my favorite C++ resource management policy is scopes and stacks.

This topic is closed to new replies.

Advertisement