Garbage Collection

Started by
8 comments, last by Extrarius 20 years, 3 months ago
I've been doing a lot of searching for information about implementing a garbage collector in C++, and so far everything I've found is exactly what I don't want. I've found many papers stating it is impossible to have efficient garbage collection in C/C++, papers explaining why it isn't a good idea, papers explaining the benefits of garbage collection, some papers that talk about how much performance you can gain if you make the GC run parallel to the program, some stats from 1990 that show a maximum of a 100ms pause when GCing(which gives me hope for awesome speed =-), papers suggesting you use a GCed language if you want GCing, and a few obfuscated libraries for doing automated GCing by scanning the heap for pointers and doing bit fiddling (and most pages I found talking about the libraries didn't have good things to say). What I want, is a library (or information about creating a library, even abstract information on GC would be helpful) that lets me control the GCing so I can make only certain objects and pointers to them collected. I'm willing to jump through all kinds of hoops to impelemnt it, and call extra functions when using pointers to help it out, etc (I've found lots of the heap-scanning implementations that brag about not having to control it or jump through hoops, but they assume you want everything GCed, which isn't the case). Switching to a GCed language kind of defeats the point, since I'm trying to make a GCed language to learn what goes into the whole process of making a full-features scripting language. It HAS to be possible to make a GC in C/C++ since some compilers/interpeters for GCed languages were written in C/C++. Can anybody recomend a good place(note that I've already searched google and several other sites and have come up with what I listed above) to get information about garbage collection in general and specifically high-speed garbage collection in C++? [edited by - extrarius on January 18, 2004 12:45:02 PM]
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Advertisement
If there was a good way of adding GC to C++ you would certainly have seen those already. There are several breaking reasons why that is impossible. How can you make part of your environment GCed when other parts may reference your objects?

I would recommend reading this post which highlights some problems: http://discuss.develop.com/archives/wa.exe?A2=ind0010A&L=DOTNET&P=R28572


> It HAS to be possible to make a GC in C/C++ since
> some compilers/interpeters for GCed languages were
> written in C/C++.

No, that is completely uncorrelated. Probably most garbage collected environment are written in C/C++, but that''s completely different from adding GC to C/C++ ifself.


What are you really trying to accomplish? I think you make it easier for yourself by moving to a completely GC:ed environment (such as .Net or Java), or accept the C++ environment as it is.

Have you tried writing a simple mark-and-sweep collector? It''s very easy. Do this:

1. In every class, put a "marked" bit.

2. In every class that needs automatic collection, add a "mark" method. This should be mutually recursive: it should traverse any pointers in the object and mark those, too.

3. Write a global "mark" routine that marks all the objects in global variables.

4. Maintain a global linked-list of all the objects. To do this, you might need to put an "allobjs_next" field in each object so you can make them all into one big list.

5. Write a global "sweep" routine that says, "for every object allocated, if it''s not marked, free it, and if it is marked, unmark it."

The result might look like this:

// Class definitions, including mark bits, mark routine, etc.

class myclass1
{
// Garbage-collection related variables.
bool marked;
myclass1 *allobjs_next;

// Ordinary fields that are not specifically GC-related.
int x,y;
myclass1 *foo;
myclass2 *bar;

// The "mark" method.
void mark(void) {
if (marked) return;
marked = true;
if (foo) foo->Mark();
if (bar) bar->Mark();
}
}

class myclass2
{
bool marked;
myclass2 *allobjs_next;
...
}

// The global mark routine.

myclass1 *global1;
myclass2 *global2;

void markglobals(void)
{
if (global1) global1->Mark();
if (global2) global2->Mark();
}

// The global lists of objects, and the global sweep routine.

myclass1 *allobjs_myclass1;
myclass2 *allobjs_myclass2;

void globalsweep(void)
{
myclass1 *x;
for (x=allobjs_myclass1; x; x=x->allobjs_next) {
if (!x->marked) { delete object and remove from list }
else x->marked = false;
}
myclass2 *y;
for (y=allobjs_myclass2; y; y=y->allobjs_next) {
if (!y->marked) { delete object and remove from list }
else y->marked = false;
}
}


There''s two weaknesses in this super-simple approach:

First, the mark routine cannot find pointers to myclass1 or myclass2 in local variables. If your game has a main loop, the solution is easy: only garbage collect at the top of the main loop, when there isn''t anything at all in local variables.

Second, the mark routine cannot find pointers to myclass1 or myclass2 if they occur in a third type of object, myclass3. So that means, you cannot put pointers to markable objects inside non-markable objects. So this approach tends to permeate your program - if one object is marked, many will need to be.

- Josh



PS. Ignore these idiots who say GC cannot be added to C++. I have been using garbage collection with C since before C++ existed.

- Josh
quote:Original post by Anonymous Poster
If there was a good way of adding GC to C++ you would certainly have seen those already.[...]
I have seen them. The problem is, they don't do it the way I want. They GC the entire environment automatically, focusing on the automatic part. I don't mind having to jump through all kinds of hoops to get it to work the way I want. One of the probably things I would have to do is tell it which object is pointing to which object.
I repeat: I don't mind doing extra work to make parts of the system GCed.

Nekhmet: That sounds like it will work, with a little bit of trickery (it must be possible to have a non-GC object point to a GC object, so I'll have to hack that in at least). Dealing with local variables and the like is easy: The pointer replacement will just need to register itself in the list under certain conditions.
Thanks for a basic explanation. It is amazing how hard it can be to find such a thing on some topics =-/

[edited by - extrarius on January 18, 2004 3:13:47 PM]
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
I am not sure how you''ve searched, but the first result on google for C++ Garbage Collector is an exellent conservative garbage collector for both C and C++. You can get it at http://www.hpl.hp.com/personal/Hans_Boehm/gc/. There''s also a lot of information about garbage collection on this site if you want to create your own.

- lmov
- lmov
Well, first thing on that page "The Boehm-Demers-Weiser conservative garbage collector can be used as a garbage collecting replacement for C malloc or C++ new" sounds like one of the many kinds of GC I don't want. I do not want global GC.

I'm not in a good state to look over the references on that page right now(been up way too many hours) but I will do so after I get some sleep.

[edited by - extrarius on January 18, 2004 3:55:36 PM]
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
www.codeproject.com has at least two garbage collectors for C++ with manual triggers for collection. Doing a search on the site might turn up more. IIRC, getting at the source code on that site requires registration.

I personally have used a handle-based generation garbage collector (it''s mark and compact rather than mark and sweep) in C++. Basically, every collectable object descends from a GCable base class. And references to GCable objects are only done through smart pointers. The smart pointers handle registration with the root set or the generation managers themselves, and the GCable heap lives in a fixed address space so it''s possible to quickly at runtime determine where a given handle lives. This way non-GC''ed heap objects can hold handles to GC''able objects. All non-handle references to inside a GC''able object are treated as weak references.
quote:Original post by Extrarius
Well, first thing on that page "The Boehm-Demers-Weiser conservative garbage collector can be used as a garbage collecting replacement for C malloc or C++ new" sounds like one of the many kinds of GC I don''t want. I do not want global GC.


Boehm''s garbage collector can be used for only some of the pointers in your program. Basically it collects anything allocated with GCmalloc() instead of malloc(). And it is possible to force it to never trigger automatic collection.
After looking over it some more, it looks like it is something I could use. It seems that it could be made slightly more efficient by creating it for C++ instead of for C and then making a class wrapper for that.

Most likely, I inappropriately dismissed it in my search after having seen many garbage collectors that claimed the same thing(that they could be drop-in replacements for new/malloc) but were not as flexible.

[edited by - extrarius on January 18, 2004 11:06:39 PM]
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk

This topic is closed to new replies.

Advertisement