Sign in to follow this  
dave

Why Need A Garbage Collector?

Recommended Posts

Hey, Over the past day i have been playing with memory management. First i overloaded new and delete to see what goes on. Then secondly i wrote a singleton class that keeps track of the allocated data, and the total allocated etc. No i here all this jazz about garbage collectors and superpig went over one in Enginuity but i still can't honestly see the point. Now i know im wrong, not trying to be naive here, but i have never had it explained to me properly. Would anyone mind explaining? ace

Share this post


Link to post
Share on other sites
The garbage collector is supposed to hold onto your 'no longer referenced' objects and delete them at times when it won't slow down the application. It doesn't always work out that way though.

Share this post


Link to post
Share on other sites
The other important thing is that you have single instances of objects bouncing around in multiple classes and handlers.

So if an object shouldn't be rendered, and it disappears from the quadtree, should it be deleted? Not if you just took it out to manipulate it with the intention of putting it back in later.

Share this post


Link to post
Share on other sites
So would all this involve writing your own smart-pointers that do reference counting?

I compltely understand how smart pointers work but one thing im not sure about is how you would create a collection of them, especially since each ones type may be different.

Would you derive a smart-pointer from a non-templated base class?

ace

Share this post


Link to post
Share on other sites
Quote:
Original post by ace_lovegrove
So would all this involve writing your own smart-pointers that do reference counting?

I compltely understand how smart pointers work but one thing im not sure about is how you would create a collection of them, especially since each ones type may be different.

Would you derive a smart-pointer from a non-templated base class?

ace


Basically. You could create a base 'object' class that keeps track of references. When it's references drop down to 0, it sends itself to the garbage collector.

edt: all your classes that are 'garbage collectable' need to inherit from this base class.

Share this post


Link to post
Share on other sites
You never need a garbage collector, but what it does is alleviate the programer of the responsibility of explicitly deallocating memory. Failure to free acquired memory is a major source of problems in C and C++ programs, thus it seems desirable to have some automated way of dealing with this.

I haven't read the Enginuity articles in a while, but I believe superpig uses essentially a reference counted garbage collector (which is analagous to most "smart pointer" impelementations, when the ref count goes to 0, the object is freed). The "major" disadvantage to this approach is that each "garbage collected" object has to store two pointers; one to the actual object, and one to a shared reference count (this also means that the reference count needs to be allocated on the heap, unless you use some sort of intrusive reference counting mechanism). The other disadvantage is that it can be slower, as you have to increment the ref count on copy and then decrement when the object goes out of scope (and check to see if the ref count is zero). Most garbage collected languages (such as Java) don't use this method. Because those languages have full knowledge of the runtime enviroment, they can use more sophisticated algorithms such as mark-and-sweep, etc (basically, mark all objects that can be reached via some pointer, and then delete all the ones that aren't marked).

I however am of the opinion that the overhead incured via reference counted smart pointers is not of that great a significance, especially if properly used (try to keep the number of times the SP actually has to be copied to a minumum). Also, if I have to choose between destructors and deterministic destruction, or garbage collection with no destructors, I'll take the destructor approach any day of the week, as I think it's a more general approach that has a lot more applications (basically all the RAII stuff)

Share this post


Link to post
Share on other sites
Yes, the brunt of the garbage collecting is done through a smart-pointer/handle interface.

However, more advanced memory management schemes take advantage of handle factories, which assign some sort of constant ID to objects.

This simplifies saving and loading from disk, and various other little things here and there.

It also solves your problem of having smart pointers in a collection. As far as I know, handle-based pointers are the only way to remedy it. The reason is that smart pointers are sort of "dummy" pointers, in that they aren't actually pointers, they simply mimick them. So things like polymorphism and casting become impossible.

Share this post


Link to post
Share on other sites
Quote:
Original post by nilkn
The reason is that smart pointers are sort of "dummy" pointers, in that they aren't actually pointers, they simply mimick them. So things like polymorphism and casting become impossible.

This isn't true. Some smart pointers certainly have these problems, but not others. Take a look at boost::shared_ptr. It supports everything you've mentioned.

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Quote:
Original post by nilkn
The reason is that smart pointers are sort of "dummy" pointers, in that they aren't actually pointers, they simply mimick them. So things like polymorphism and casting become impossible.

This isn't true. Some smart pointers certainly have these problems, but not others. Take a look at boost::shared_ptr. It supports everything you've mentioned.


Well, I stand corrected. I guess you learn something everyday!

Share this post


Link to post
Share on other sites
Surely everyone can imagine a huge project which multiple people work on, where even smart pointers aren't going to be enough to ensure there are no memory leaks.
Garbage collection is not for everything though. It's just another tool for the job. No one tool is best for every job. As always, it depends on the project, but there's usually little point in using GC on a small project.
Many of us have learned how to take care of memory management efficiently and carefully enough to not create leaks. That isn't a skill that you should merely throw away.
Don't throw away your hammer just because you now have a nail gun. However in many cases a nailgun is going to save time, particularly if you're a beginner.

Share this post


Link to post
Share on other sites
I'm not exactly an expert with GC, so instead of making vauge explainations as to some of their benifits, I'll simply quote from the D Programming Language website.

Quote:
From http://www.digitalmars.com/d/garbage.html:
D is a fully garbage collected language. That means that it is never necessary to free memory. Just allocate as needed, and the garbage collector will periodically return all unused memory to the pool of available memory.

C and C++ programmers accustomed to explicitly managing memory allocation and deallocation will likely be skeptical of the benefits and efficacy of garbage collection. Experience both with new projects written with garbage collection in mind, and converting existing projects to garbage collection shows that:

* Garbage collected programs are faster. This is counterintuitive, but the reasons are:
o Reference counting is a common solution to solve explicit memory allocation problems. The code to implement the increment and decrement operations whenever assignments are made is one source of slowdown. Hiding it behind smart pointer classes doesn't help the speed. (Reference counting methods are not a general solution anyway, as circular references never get deleted.)
o Destructors are used to deallocate resources acquired by an object. For most classes, this resource is allocated memory. With garbage collection, most destructors then become empty and can be discarded entirely.
o All those destructors freeing memory can become significant when objects are allocated on the stack. For each one, some mechanism must be established so that if an exception happens, the destructors all get called in each frame to release any memory they hold. If the destructors become irrelevant, then there's no need to set up special stack frames to handle exceptions, and the code runs faster.
o All the code necessary to manage memory can add up to quite a bit. The larger a program is, the less in the cache it is, the more paging it does, and the slower it runs.
o Garbage collection kicks in only when memory gets tight. When memory is not tight, the program runs at full speed and does not spend any time freeing memory.
o Modern garbage collectors are far more advanced now than the older, slower ones. Generational, copying collectors eliminate much of the inefficiency of early mark and sweep algorithms.
o Modern garbage collectors do heap compaction. Heap compaction tends to reduce the number of pages actively referenced by a program, which means that memory accesses are more likely to be cache hits and less swapping.
o Garbage collected programs do not suffer from gradual deterioration due to an accumulation of memory leaks.
* Garbage collectors reclaim unused memory, therefore they do not suffer from "memory leaks" which can cause long running applications to gradually consume more and more memory until they bring down the system. GC'd programs have longer term stability.
* Garbage collected programs have fewer hard-to-find pointer bugs. This is because there are no dangling references to free'd memory. There is no code to explicitly manage memory, hence no bugs in such code.
* Garbage collected programs are faster to develop and debug, because there's no need for developing, debugging, testing, or maintaining the explicit deallocation code.
* Garbage collected programs can be significantly smaller, because there is no code to manage deallocation, and there is no need for exception handlers to deallocate memory.

Garbage collection is not a panacea. There are some downsides:

* It is not predictable when a collection gets run, so the program can arbitrarily pause.
* The time it takes for a collection to run is not bounded. While in practice it is very quick, this cannot be guaranteed.
* All threads other than the collector thread must be halted while the collection is in progress.
* Garbage collectors can keep around some memory that an explicit deallocator would not. In practice, this is not much of an issue since explicit deallocators usually have memory leaks causing them to eventually use far more memory, and because explicit deallocators do not normally return deallocated memory to the operating system anyway, instead just returning it to its own internal pool.
* Garbage collection should be implemented as a basic operating system kernel service. But since they are not, garbage collecting programs must carry around with them the garbage collection implementation. While this can be a shared DLL, it is still there.


As a side note, some new GCs don't have to do their entire collection run in a single pass, and can instead "pause", which makes them more suitable for games, where an arbitrary pause is probably bad.

Share this post


Link to post
Share on other sites
Would an idea be for every smart-pointer that is managed by the GC to have an 'alive-time' which ticks down when it is not being used. If it reaches zero it can be deleted or marked for referal. So that next time a garbage sweep is done it can be deleted.

One thing that confuses me is why you would ever need to delete something in the middle of a program.

Surely the code wouldn't have new'ed it if it wasn't needed.

ace

Share this post


Link to post
Share on other sites
Quote:
Original post by ace_lovegrove
One thing that confuses me is why you would ever need to delete something in the middle of a program.

Lots of times you need temporary objects for calculations etc. For example ODE and other physics libraries check for collisions and generate joints and constraints for those that are touching. Then the rest of the algo goes and solves these constraints. At the end of the frame all the constraints are scrapped and the whole thing starts again.

A traditional C approach would be to have a big pool of these constraints and use them as needed. However that means you have a hard limit on the number of constrains as they are all allocated at once. Additionally you waste a lot of memory by potentially having a lot of unused constraints permenantly hanging around. Therefore it is more efficient to new/delete these as needed*.

Incidentally, just reference counting via smart pointers isn't a proper GC. Circular references will screw you up there. Proper GCs will handle all cases and only delete objects that aren't reachable any more.

* Actually, if you're doing something that news/deletes this frequenctly a pool might be better given that allocation isn't free. But you can still add this ontop of a GC system (potentially expanding or contracting the pool as needed).

Share this post


Link to post
Share on other sites
Quote:
Original post by ace_lovegrove
Would an idea be for every smart-pointer that is managed by the GC to have an 'alive-time' which ticks down when it is not being used. If it reaches zero it can be deleted or marked for referal. So that next time a garbage sweep is done it can be deleted.

One thing that confuses me is why you would ever need to delete something in the middle of a program.

Surely the code wouldn't have new'ed it if it wasn't needed.

ace


A timer like that would cause massive problems.
...
You new'ed it because it was needed. But now the object has finished being used and nothing in the application references it. Think of a situation where you have a collection for the player's items. When the items are put into the players possession, a reference counter was incremented by 1 on each of these objects. When the player drank the potion item it is removed from the collection. When it is removed, the reference is decremented by 1. Now, we'll say no other pointers were referencing it and is 'unreachable'. Internally to the object, it knows it has no references because it's reference count just hit 0, now it sends itself to the garbage collector. Think of a situation where maybe you are streaming objects to disk in a seperate process and it has a reference to this object. If that was the case, this utility function would create a reference, so it would still have a reference count and would not be destroyed until after this utility function had been completed. If you had just deleted the object when the player was done, your utility function would have attempted to access a null pointer or worse.

Share this post


Link to post
Share on other sites
If you don't program in a garbage-collected language (or in C++ with a garbage collector), you will find it hard to understand why anyone would want such a feature. It is the same with any other programming feature - I bet you'd never want a lexical closure or a continuation either.

Let me just show you why GC is nice in a language (e.g. Lisp). It means you can really code without worrying about any memory issues. If you want to return a struct, you can just create it and throw it around anywhere you want. And, of course, GC is necessary in a language where you deal with lists all the time - it would be hell to worry about deallocating all of them.

I believe the advantages outweigh the disadvantages unless perhaps you are doing something that absolutely must be in real time. Even so, modern GCs can do wonders.

Basically, my advice is to try Lisp, or, failing that, Java. You'll find that the GC is highly convenient, and allows you to get more work done. It also makes your code safer because it takes us flawed humans out of the memory management picture. [wink]

Share this post


Link to post
Share on other sites
You know, after using languages that support garbage collection for quite some time, I recently went back to C++ and found myself utterly incapable of doing anything useful without using boost::shared_ptr. In particular, I'm working on an interpreter for a lisp dialect, a place where I *really* don't want to embed reference counting, yet doing anything without it is such a royal pain in the ass I decided to use it despite performance issues. GC really grows on you. For example, suppose you have something like this: (pseudolanguage)


class datum
{
abstract datum* eval();
}

class integer : datum
{
datum* eval()
{
// evaluates to itself
return this;
}

int _value;
}

class symbol : datum { /* ... */ }
// ...



Integer is a very simple example, but suppose you're evaluating something that creates new data structures. With GC you simply return the newly created struct to the destination and forget about it. What do you do without GC? Even 'return this' in the above example is dangerous. When does the integer get deallocated? Who's responsible for deallocation? What happens to all the other places that use it? By the time you solve all these problems you end up happing to jump through so many hoops your code becomes significantly less clean, much harder to maintain, etc.

Share this post


Link to post
Share on other sites
Quote:
Original post by Tron3k
It is the same with any other programming feature - I bet you'd never want a lexical closure or a continuation either.

That's actually a big downside when it comes to learning Lisp: once you rewire your mind to think in broad context it's tremendously hard to squeeze it back into narrow bounds. I use Java at work and I find myself incredibly frustrated at its verbosity and lack of abstractions I am now used to. More often than not I either misuse Java features to poorly simulate Lisp or write perlish write-only code. Is it better to have loved and lost than never to have loved at all? [grin]

Share this post


Link to post
Share on other sites
So the idea of a GC is to simply take away the need for you cll delete?

"Lots of times you need temporary objects for calculations etc. For example ODE and other physics libraries check for collisions and generate joints and constraints for those that are touching. Then the rest of the algo goes and solves these constraints. At the end of the frame all the constraints are scrapped and
the whole thing starts again."

So why not just manually call delete for the objects once you're done with them.

ace

Share this post


Link to post
Share on other sites
Quote:
Original post by ace_lovegrove
So the idea of a GC is to simply take away the need for you cll delete?
[...]
So why not just manually call delete for the objects once you're done with them.

ace


That's exactly the point:

#1 You may forgot to call delete and thus create a memory leak.
#2 You may call delete when other objects hold a reference to your temporary object and thus cause your program to crash.

With GC you avoid #1 (programmer lazy-ness, and bad structure of code).
And you avoid #2 as well.
And you gain #3 : You do not need to worry about memory allocate/deallocation anymore and concentrate on algorithm instead.

Robin

Share this post


Link to post
Share on other sites
Allocations are much cheaper when you have a GC around, as well. You no longer think, "maybe I don't *have* to call new..."

Share this post


Link to post
Share on other sites
The thing that bothers me the most about a lack of a GC is the question of ownership. In some algorithms it is not clear who should own the memory, and yet you are forced to choose
boost::shared_ptr is a god send for cases like these

Share this post


Link to post
Share on other sites
Quote:
Original post by ace_lovegrove
So why not just manually call delete for the objects once you're done with them.

Because often when and where you should delete something is ambiguous - especially if an object gets created in one place and passed around for processing in various other seperate areas. And if you say "well everything should have a clear owner" then you're too stuck in standard C++.

Share this post


Link to post
Share on other sites
I recall hearing that GC implementations, to have good performance, generally need to have some multipule >=2 times as much memory available compared to explicit deletes. Is that still true?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this