Using Programmer Knowledge to Speed Generation Garbage Collecting

Started by
6 comments, last by Antheus 16 years, 2 months ago
So, I had an idea in the shower, and I'm really just curious what people think. When using generation GC in Java or .Net, a new object always gets placed into the youngest generation. This is my starting assumption, so if I'm wrong about it let me know. But what about objects that the programmer knows are going to be around for longer? If I create an instance and there is no way it will be unreachable for the lifetime of the program, then there is a good chance that it will be promoted multiple times through the generations. What if the programmer could tell the memory manager when creating an instance what its lifetime is approximately. For example, we could call the previous instance "new longlived Object()". The memory manager then takes this into consideration and places the instace in a higher generation from the start. This way the object gets promoted/copied less, which is one of the more expensive parts of GC. In addition, GC runs less often, because long lived (and often large) objects are not clogging up the younger generations forcing collections. Taking it a step further, the debug runtimes can warn the programmer if an instance "new temporary Object()" is getting promoted often, signifying that either a reference is accidently being kept or that the instance might need to start at a higher generation. This owuld help fix bugs where the programmers accidently keep a reference to an object (say through delegates) and are keeping large amounts of memory around for no reason. So the advantages of this system are: - Less Promotions - Less Collections - Improved debug information about object lifetime Any thoughts?
Turring Machines are better than C++ any day ^_~
Advertisement
Quote:But what about objects that the programmer knows are going to be around for longer?


Never underestimate either incompetence or malice of a programmer to abuse any mechanism given to them.

I'm seeing this already:
new longlived Singleton();


Besides, everything you mention already happens implicitly.

Generational GC works simply like this (2 young heaps, several old heaps):
- On every GC tick
- For every static object
- traverse all references and copy them to other heap
- mark current heap as empty (objects with no references are "de-allocated")

This has several advantages.
- Allocator is incremental only. Whenever new is called, allocation is very fast since it simply moves allocation pointer ahead by the size of allocation
- Live objects are marked implicitly by references themselves. No need for extra book-keeping
- If no "longlived" objects are created between two passes (e.g. all allocated objects have been freed), nothing needs to be copied between heaps, making GC very fast

Simply put: garbage collection is expensive for long-lived objects, and free for temporaries. If you allocated 1 million temporaries between two GC ticks, the cost of garbage collection for them is 0. If you allocate 100 objects and keep references to them, then 100 objects need to be copied.

Each survivor will only be copied once or twice, then they are promoted to long lived heap.

As long as you're talking about heap memory, the above is as optimal as it gets.

Stack allocations, RVO and preventing allocation in the first place is where the benefits lie. This is why statically compiled Java applications run so much faster.
Quote:Original post by Antheus
As long as you're talking about heap memory, the above is as optimal as it gets.

Stack allocations, RVO and preventing allocation in the first place is where the benefits lie. This is why statically compiled Java applications run so much faster.


Quoted for emphasis.

For what it's worth, the latest version of Java provides for automatically converting to stack allocation where possible. (The savings presumably result from being able to perform collection less often, as the young generation heap fills more slowly.)
Regarding .NET and making the discussion more concrete, Garbage Collector Basics and Performance Hints by Rico Mariani. A very general approach writtten in the year 2003, still valid. Take a note especially the way memory regions are used for the generations. It should answer your concerns.

If you care to read, there's an excellet blog post by Tess, .NET Garbage Collector PopQuiz - Followup in her also otherwise excellent debugging blog. Also mentioned in Tess's blog is the excellent post written by Joe Duffy, DG Update: Dispose, Finalization, and Resource Management.

These articles should answer most of the questions you could possibly have. Things like 64-bit processors, safe handles, multipile cores etc. bring some new issues regarding large object heaps, the actual collection and such, but nothing critical considering the question you had in your mind.
---Sudet ulvovat - karavaani kulkee
After you create your long lived objects force garbage collection a couple of times to promote them to the long lived section.
Quote:Original post by stonemetal
After you create your long lived objects force garbage collection a couple of times to promote them to the long lived section.
Yeah, or doing the trick like it's usually done: Improving Managed Code Performance and navigating from there to Avoid Calling GC.Collect. In general not really a good idead except in some situations like Rico M. gives us reason to believe in his blog post When to call GC.Collect().
---Sudet ulvovat - karavaani kulkee
Quote:Original post by Antheus
Quote:But what about objects that the programmer knows are going to be around for longer?


Never underestimate either incompetence or malice of a programmer to abuse any mechanism given to them.

I'm seeing this already:
new longlived Singleton();


Besides, everything you mention already happens implicitly.

I understand it happens implicitly. My point is that it takes time/resources to do it implicitly, when the programmer could very easily provide just a little more knowledge about an items usage and avoid much of that overhead all together. There is no extra book-keeping cost. For example, in .Net a "longlived" object would be placed directly into Gen 2, reducing the rate of Gen0/Gen1 collections and the amount of copying related to promoting objects. If you are going to have a Singleton, than this is ideal. It wouldn't be an abuse of the mechanism, it would still just be an abuse of singletons.

In other words, all the advantages you mentioned remain true because I'm not changing any of it All that changes the last advantage:
Quote:
- If no "longlived" objects are created between two passes (e.g. all allocated objects have been freed), nothing needs to be copied between heaps, making GC very fast


This is the entire point of the idea. By suplementing the memory management with knowledge of when an object is long lived, then even when these objects are created "nothing needs to be copied between heaps, making GC very fast".


On a different note, while thinking about it since I've realized that you could additionally optomize this behaviour at runtime, analyzing the expected lifetime of instances to place them in an appropriate generation the next time they are allocated.
Turring Machines are better than C++ any day ^_~
Quote:In other words, all the advantages you mentioned remain true because I'm not changing any of it All that changes the last advantage:


Amdahl's law - you're optimizing the 20% of the 20% part.

80% is spent on redundant allocations. Don't allocate, and you've saved 80%.
80% of garbage collection is spent on short-lived objects.

For example - singletons are allocated *exactly once* per application life-time. Do we really need to modify the language definition, memory model, garbage collection and backward compatibility to save "1 allocation per application life-time"?

If an object is truly long-lived, the cost of 2 copies will be negligible. If that object however turns out to be short-lived, it won't be copied even once.

This topic is closed to new replies.

Advertisement