Garbage collection algorithim

Started by
11 comments, last by jwezorek 11 years, 9 months ago
I'm looking for a garbage collection algorithim that would behave like the following psudo code example specifies.
Class Foo {
int id;
GraphNodePtr<Foo> next //defaultly null
Foo(int theID):id(theID){}
};


GraphNodePtr<Foo> graph(0);
{
GraphNodePtr<Foo> node1(1);
GraphNodePtr<Foo> node2(2);
GraphNodePtr<Foo> node3(3);
node1->next = node2;
node2->next = node3;
node3->next = node1; //circular loop
graph->next = node1;
}
graph.next.release()//nodes 1, 2, and 3 are immediately deleted here


That is, you have a a single pointer type that can detect orphaned loops. And the garbage collection needs to guarantee that orphaned objects are released once they are orphaned.
Advertisement
I'm not sure if a single pointer type exists/fits here. You could go for weak_ptr<> and shared_ptr<>, but I'm guessing that's not what you want?
Mark and sweep can generally do this, provided you have access to the stack and any other root nodes possible in the program. Can you provide more details about what you're trying to do?

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


Mark and sweep can generally do this, provided you have access to the stack and any other root nodes possible in the program. Can you provide more details about what you're trying to do?
Mark and sweep would work, but then you would have to call it every time a node is deleted or released, and it would run through unrelated code to look for links to this graph. Is there an algorithm that would take advantage of the fact that releasing node 1 can only release nodes pointed to by node 1, or that can detect the loop when it is made without unmanageable overhead?

I'm not trying to do anything besides learn about general purpose garbage collection.

Mark and sweep would work, but then you would have to call it every time a node is deleted or released, and it would run through unrelated code to look for links to this graph.
Mh, no.

I'm not trying to do anything besides learn about general purpose garbage collection.
No problem! It's just like learning "general purpose sorting"!

int array[SOME_REALLY_BIG_NUMBER];
Init(array, SOME_REALLY_BIG_NUMBER);
GeneralPurposeSort(array); // O(1), stable, online
But in the real world, things are a bit different. You don't learn something by marking existing algorithms as "not interesting to me".

Previously "Krohm"


[quote name='King Mir' timestamp='1342409916' post='4959441']
Mark and sweep would work, but then you would have to call it every time a node is deleted or released, and it would run through unrelated code to look for links to this graph.
Mh, no.

I'm not trying to do anything besides learn about general purpose garbage collection.
No problem! It's just like learning "general purpose sorting"!

int array[SOME_REALLY_BIG_NUMBER];
Init(array, SOME_REALLY_BIG_NUMBER);
GeneralPurposeSort(array); // O(1), stable, online
But in the real world, things are a bit different. You don't learn something by marking existing algorithms as "not interesting to me".
[/quote]I didn't say Mark and sweep wasn't interesting. I just commented that it doesn't take advantage of some additional information that appears to be present in the example.

Also to be clear, deleting nodes immediately when they become unreachable is part of the requirements of the question. It's of some advantage to have resources released as soon as possible, and I want to know the costs of doing so.
The information you want to exploit is, in general, not available. This is the reason for which GC is run every once in a while rather than every time something goes out of "scope". The cost of deleting objects "immediately" is untolerable for anything beyond trivial examples: no GC methods I recall (as used in major programming languages) allowed immediate, deterministic deletion of garbage, albeit some allowed some cooperation with escape analysis. In general I'd say the largest amount of papers I've consulted appeared to take for granted GC is too costly to not use it to delete high amounts of garbage (in terms of graph nodes).

What question are you talking about? Your original question does not seem to imply immediate destruction.

Note GC typically does not care in detecting loops but rather in isolating islands of unused objects.

Previously "Krohm"


The information you want to exploit is, in general, not available. This is the reason for which GC is run every once in a while rather than every time something goes out of "scope". The cost of deleting objects "immediately" is untolerable for anything beyond trivial examples: no GC methods I recall (as used in major programming languages) allowed immediate, deterministic deletion of garbage, albeit some allowed some cooperation with escape analysis. In general I'd say the largest amount of papers I've consulted appeared to take for granted GC is too costly to not use it to delete high amounts of garbage (in terms of graph nodes).

What question are you talking about? Your original question does not seem to imply immediate destruction.

Note GC typically does not care in detecting loops but rather in isolating islands of unused objects.
I did say the nodes should be deleted here at the last line, but I'll edit the post to be clearer.

I realize that it is indeed costly to do it every release, but how costly? What algorithm would I use if it's really really needed?
AFAIK there is no known general-purpose algorithm to do what you describe. Of course, my knowledge of garbage collection is largely academic and definitely a few years out of date, so there may be more modern tricks that I do not know about.

In general, if you want deterministic destruction, you use reference counting. If you want circular reference detection, you use true garbage collection algorithms or you use weak references. All of these options are incompatible with your request from the OP.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I've had some luck researching this, and this paper explains a decent way to take advantage of locality. See the sequential algorithm there.

It's reference counting, with additional cycle detection. From what I understand, the idea is that when you want to check for cycles (it's still expensive despite locality), for each reference counted object that has been decremented at least once, but is still in scope, see if decrementing all branches one more time leads to anything going down to 0. If it does, remove that loop. For anything not released, restore the reference count.

This topic is closed to new replies.

Advertisement