Finalizers confusion

Started by
7 comments, last by Krohm 11 years, 6 months ago
Today, I am aiming at getting a better understanding of finalizers for GC'd memory management.
Ok, they are destructors. Instead of getting called at destruction, they get called at some time in the future. Hopefully.
1st question steams from MSDN.
Finalize Methods and Destructors
Implementing Finalize methods or destructors can have a negative impact on performance and you should avoid using them unnecessarily. Reclaiming the memory used by objects with Finalize methods requires at least two garbage collections. When the garbage collector performs a collection, it reclaims the memory for inaccessible objects without finalizers. At this time, it cannot collect the inaccessible objects that do have finalizers. Instead, it removes the entries for these objects from the finalization queue and places them in a list of objects marked as ready for finalization. Entries in this list point to the objects in the managed heap that are ready to have their finalization code called. The garbage collector calls the Finalize methods for the objects in this list and then removes the entries from the list. A future garbage collection will determine that the finalized objects are truly garbage because they are no longer pointed to by entries in the list of objects marked as ready for finalization. In this future garbage collection, the objects' memory is actually reclaimed.[/quote]So perhaps the finalizer is managed as being accessed by a delegate. 1st GC pass sets an object's delegate to [font=courier new,courier,monospace]null [/font]and adds an entry for finalization. The object is not collected but their finalize methods are run. 2nd GC pass sees those objects as not having a finalizer to run and can therefore free their memory blob with no hassle.
Or perhaps we have sets with allocation IDs to ignore. I don't find this very important.

But why to require 2 passes? Cannot we just run the finalizer immediately before releasing a blob?
Perhaps this is consequence of having separated trace phases and a collection phases? Or maybe it's something dealing with multiple threads?

2nd question. To extend an object's lifetime, use GC.KeepAlive. I don't understand why this is needed.
[source lang="csharp"][font=courier new,courier,monospace]MyWin32.HandlerRoutine hr = new MyWin32.HandlerRoutine(Handler);
MyWin32.SetConsoleCtrlHandler(hr, true);[/font]
[font=courier new,courier,monospace] // Give the user some time to raise a few events.
Console.WriteLine("Waiting 30 seconds for console ctrl events...");[/font]
[font=courier new,courier,monospace] // The object hr is not referred to again.
// The garbage collector can detect that the object has no
// more managed references and might clean it up here while
// the unmanaged SetConsoleCtrlHandler method is still using it.

// Force a garbage collection to demonstrate how the hr
// object will be handled.
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

Thread.Sleep(30000);[/font]
[font=courier new,courier,monospace] // Display a message to the console when the unmanaged method
// has finished its work.
Console.WriteLine("Finished!");[/font]
[font=courier new,courier,monospace] // Call GC.KeepAlive(hr) at this point to maintain a reference to hr.
// This will prevent the garbage collector from collecting the
// object during the execution of the SetConsoleCtrlHandler method.
GC.KeepAlive(hr); [/font] [/source]
It would appear to me a better way to deal with this problem would be to have explicit notion of what [font=courier new,courier,monospace]SetConsoleCtrlHandler [/font]does. But I see this might not be possible. In my head, [font=courier new,courier,monospace]hr [/font]would not be collected anyway as there's a reference to it on the stack.
But C# GC is smarter than that... it appears it will look ahead in code to see if something is not referenced anymore and thus clean it... perhaps C# stack/start set building does not work as I expect.
Anyone knows how [font=courier new,courier,monospace]KeepAlive [/font]helps with the problem? I'm inclined to speculate it might actually be NOP just to prevent the flow analyzer from marking the reference unused.

Previously "Krohm"

Advertisement
Hey, I don't know about the exact implementation of the whole thing in C#, but in general its a bad idea to use a finalize method at all. If your code needs it to work properly, then your code is unstable.

About KeepAlive: basically it tells the GC to not collect something that isn't needed anymore. In the code example, "hr" is not needed anymore but they still tell the GC to keep it alive. This is something that you will probably never need. Still an example though: lets say you have a big object and you don't want the GC to destroy it while drawing, as it may result in a FPS loss, then you would call KeepAlive so it won't destroyed just yet (maybe later when a loading screen is active or whatever).

The whole point about Garbage Collection is that you don't have to worry about this stuff, but if you want to do so anyway, I suggest using a language that comes with new and delete.
Becouse there is no guarantee when the finalizers in C# is run, if they are ever run, there should be minimal amount of code in them. Any code (that affects the program) in the finalizers can be considered as a bug.

This means the burden of destructing objects, the resource management, is left for users. You can't even rely on RAII. IDisposable helps, but you have to still manually decide where to wrap them in using block or call the Dispose.

The whole point about Garbage Collection is that it forces you to manually manage resources if you care about them being released.
I don't see how this relates to the two questions above.

Previously "Krohm"


But why to require 2 passes? Cannot we just run the finalizer immediately before releasing a blob?
Perhaps this is consequence of having separated trace phases and a collection phases? Or maybe it's something dealing with multiple threads?

My guess is that it's becouse multiple threads. Another thread can reclaim the memory, while another thread is running the finalizers.

See Finalization Internals at http://msdn.microsoft.com/en-us/magazine/bb985010.aspx


Anyone knows how [font=courier new,courier,monospace]KeepAlive [/font]helps with the problem? I'm inclined to speculate it might actually be NOP just to prevent the flow analyzer from marking the reference unused.

I guess the detection for unreferenced objects happens at bytecode level, and the scope for an object is not as it's seen in C# code. So maybe KeepAlive just keeps the object referenced, as it's seen in CIL and bytecode.
Well, the link does not reall say much about that... but it's some food of thought.
The reason for which the GC is ran is very simple. Consider:
[attachment=11385:gc_example.png]
Garbage set is { D, E, F, G, I }.
D can be collected.
I must run a finalizer. Therefore, we cannot collect the object it references or we'll break its code.
Suppose we run the finalizer and collect I.
Ideally we could now collect E, F and G. They were marked as garbage (or better, not marked as used) since they are not reachable from the root set. Unfortunately, garbage collection does not tell us what objects I uses.
The only way to clean them properly is to do another pass.

In other words, the presence of a finalizer causes an additional "root set" to be created. This root set is the set of all objects having finalizers which were considered garbage.

I'm afraid we won't get much information on what KeepAilve does (how it does it), but since it appears my theory is confirmed, I suppose I will just live with it.

Previously "Krohm"


But why to require 2 passes?

Even if a single pass GC would be possible, it could get time critical. And time is a major issue in GC. You don't know what happens in a finalizer, maybe releasing other stuff which would be ready for GC again, closing IO. Therefore a GC could get in a recursive trap adding more and more objects to the cleanup queue while calling more and more finalizers.
The reason to put it into a queue could be to straighten the time effect (call finalize method when there's time), whereas releasing just memory is more or less straight forward and the time impact can be better estimated (cleaning up the memory in layers).
I really like your questions.


I'm afraid we won't get much information on what KeepAilve does (how it does it), but since it appears my theory is confirmed, I suppose I will just live with it.

What I'm most interested is at what level the GC performs, it will explain a lot. C# source code is out of the question. CIL is not that optimized. Any decent optimizer will perform dead code removal, hoist variables out from loops or even out from functions. Net has major advantage that it is just-in-time. The compiler can perform runtime profile for hotspots and do optimizations that static languages cannot. Just like C++ source code after optimizer is nowhere near the matching assembly, I believe Net optimizer makes optimizations that makes KeepAlive necessary.

PS. Someone who knows CIL or bytecode generated please butt in.

Edit: Drunk typos.
Even if a single pass GC would be possible, it could get time critical. ...
Yes, it is indeed a very valuable property. The concept that finalizers could be completely deferred or even dispatched incrementally over multiple "mini-GC-ticks" didn't come to me. Thank you.

Previously "Krohm"

This topic is closed to new replies.

Advertisement