Mark and Sweep GC - convert recursion to iteration

Started by
6 comments, last by kolrabi 7 years ago
I must be being a bit thick at the moment, but I'm thinking about implementing a very simple stop-the-world mark-and-sweep garbage collection system for my scripting language and I can't for the life of me figure out how to turn the mark phase from a recursive approach into an iteration.

Recursive is trivial. Something like (not real code):

class Entity
{
public:
    std::vector<Entity*> refs;
    bool marked;
};

void mark(Entity *entity)
{
    entity->marked = true;
    for(auto &r: entity->refs)
    {
        if(!r->marked) mark(r);
    }
}

void gc(std::vector<Entity*> roots)
{
    // all entities mark set to false
    for(auto &r: roots)
    {
        mark(r);        
    }
    
    for(auto &r: heap)
    {
        if(!r->marked) reclaim(r);
    }
}
Running out of stack space isn't really a concern for my little toy language, but I believe it is always possible to convert a recursion to an iteration so was wondering if anyone could help me stumble over this little mental obstacle?

By the way, I am aware there are better algorithms for GC than this naive approach. I'm not worried about that at the moment, just want to get a basic system working first so I have something to profile against when I improve it later.

Thanks.
Advertisement

How about using a stack to replace the recursion?


void gc(std::vector<Entity*> roots)

{



   // all entities mark set to false

  std::stack<Entity*> entities(roots);

   while (!entities.empty())

   {

      Entity* entity = entities.top ();

      entities.pop();



      if(!entity->marked)

      {

         entity->marked = true;

         // push entities->refs to the stack

      }

   }



   for(auto &r: heap)

   {

      if(!r->marked) reclaim(r);

   }

Thanks Zipster. Of course then the memory would be on the heap rather than the stack which is better.

I was wondering though if this was possible without using a stack at all? Perhaps not.

No, you need something to store 'this node needs further exploration'. If you really want to get rid of the stack, you can of course make a single linked list through the nodes, but that's basically distributing your stack over the nodes.

In essence, you're doing breadth-first path-finding to all reachable locations :)

Cheers Alberth. Will go with Zipster's suggestion.

Although not helpful to your work in a modern environment.... If you want to take a real head trip, look back at how Smalltalk-80 did its mark pass in constant stack and heap space, by temporarily rewriting objects' fields as it went to become pointers back to their "parent" object (along the discovery path, that is), thus requiring only a constant amount of space (on the stack, during garbage collection) to track the nearest traversal parent, current object under inspection, and the current object-field index under inspection.

http://www.mirandabanda.org/bluebook/bluebook_chapter30.html#GarbageCollection30

Followup: also, in my personal experience, for a typical toy scripting language it is easier to write a generational garbage collector (don't mark & sweep; instead, move survivors to a separate clean heap space, leaving behind a breadcrumb saying "I was moved to gen Y offset X") than to write a two-pass mark/sweep collector, and it produces a zero-fragmentation memory model as well (which can be useful in producing error dumps or other debug checkpointing).

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.

Thanks Zipster. Of course then the memory would be on the heap rather than the stack which is better.

I was wondering though if this was possible without using a stack at all? Perhaps not.

You could also use an allocator which attempts to use stack memory before heap memory, e.g. : https://chromium.googlesource.com/chromium/chromium/+/master/base/stack_container.h

That way you get the faster behaviour as long as you don't have too many nodes to handle each time.

You could also get a similar effect by allocating a fixed size array and using that as your stack, but calling out to the old recursive version if you run out of stack space.

No, you need something to store 'this node needs further exploration'.

Well you could just mark them as such. This is how Tri-color marking works.

blah :)

This topic is closed to new replies.

Advertisement