Jump to content

  • Log In with Google      Sign In   
  • Create Account

EmeryBerger

Member Since 26 Apr 2013
Offline Last Active Jun 22 2013 08:33 PM

#5057357 Improve on Java or preapre for C++?

Posted by EmeryBerger on 27 April 2013 - 06:01 PM

The key difference between Java and C++ is not the speed of the generated code. It really is garbage collection. Garbage collection is a space-time tradeoff.

 

TL;DR - Garbage collectors need lots of RAM to run fast, but given enough RAM, they are as fast as malloc/free (BUT NOT FASTER).

 

The more space you are willing to devote to the garbage-collected heap, the faster it will run (since it will collect garbage less frequently). Given about 5x as much memory as you'd need with a C++ program, you will get about the same performance. Of course, if you need that memory for other reasons (i.e., caching data from disk / network, for large-scale in-memory computations, etc.), then that extra space comes at an extra cost. And if it would trigger paging, then obviously that is disastrous. Here's the paper and an excerpt from a  Lambda the Ultimate discussion of a paper I wrote with my student Matthew Hertz in OOPSLA 2005 (paper and presentation attached).

 

 

Quantifying the performance of garbage collection vs. explicit memory management

 

Abstract:

 

Garbage collection yields numerous software engineering benefits, but its quantitative impact on performance remains elusive. One can compare the cost of conservative garbage collection to explicit memory management in C/C++ programs by linking in an appropriate collector. This kind of direct comparison is not possible for languages designed for garbage collection (e.g., Java), because programs in these languages naturally do not contain calls to free. Thus, the actual gap between the time and space performance of explicit memory management and precise, copying garbage collection remains unknown.

 

We introduce a novel experimental methodology that lets us quantify the performance of precise garbage collection versus explicit memory management. Our system allows us to treat unaltered Java programs as if they used explicit memory management by relying on oracles to insert calls to free. These oracles are generated from profile information gathered in earlier application runs. By executing inside an architecturally-detailed simulator, this "oracular" memory manager eliminates the effects of consulting an oracle while measuring the costs of calling malloc and free. We evaluate two different oracles: a liveness-based oracle that aggressively frees objects immediately after their last use, and a reachability-based oracle that conservatively frees objects just after they are last reachable. These oracles span the range of possible placement of explicit deallocation calls.We compare explicit memory management to both copying and non-copying garbage collectors across a range of benchmarks using the oracular memory manager, and present real (non-simulated) runs that lend further validity to our results. These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational collector with a non-copying mature space matches the performance of reachability-based explicit memory management. With only three times as much memory, the collector runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection degrades performance by nearly 70%. When physical memory is scarce, paging causes garbage collection to run an order of magnitude slower than explicit memory management.

 

....

 

 

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management, by Matthew Hertz and Emery D. Berger:

Overall, generational collectors can add up to 50% space overhead and 5-10% runtime overheads if we ignore virtual memory. Very reasonable given the software engineering benefits. However, factoring in virtual memory with its attendant faulting paints a very different picture in section 5.2:

 

Garbage collection yields numerous software engineering benefits, but its quantitative impact on performance remains elusive. One can measure the cost of conservative garbage collection relative to explicit memory management in C/C++ programs by linking in an appropriate collector. This kind of direct comparison is not possible for languages designed for garbage collection (e.g., Java), because programs in these languages naturally do not contain calls to free. Thus, the actual gap between the time and space performance of explicit memory management and precise, copying garbage collection remains unknown.

 

We take the first steps towards quantifying the performance of precise garbage collection versus explicit memory management. We present a novel experimental methodology that lets us treat unaltered Java programs as if they used explicit memory management. Our system generates exact object reachability information by processing heap traces with the Merlin algorithm [34, 35]. It then re-executes the program, invoking free on objects just before they become unreachable. Since this point is the latest that a programmer could explicitly free objects, our approach conservatively approximates explicit memory management. By executing inside an architecturally-detailed simulator, this “oracular” memory manager eliminates the effects of trace processing while measuring the costs of calling malloc and free.

We compare explicit memory management to both copying and non-copying garbage collectors across a range of benchmarks, and include real (non-simulated) runs that validate our results. These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational garbage collector with a non-copying mature space matches the performance of explicit memory management. With only three times as much memory, it runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection degrades performance by nearly 70%. When physical memory is scarce, paging causes garbage collection to run an order of magnitude slower than explicit memory management.

 

These graphs show that, for reasonable ranges of available memory (but not enough to hold the entire application), both explicit memory managers substantially outperform all of the garbage collectors. For instance, pseudoJBB running with 63MB of available memory and the Lea allocator completes in 25 seconds. With the same amount of available memory and using GenMS, it takes more than ten times longer to complete (255 seconds). We see similar trends across the benchmark suite. The most pronounced case is javac: at 36MB with the Lea allocator, total execution time is 14 seconds, while with GenMS, total execution time is 211 seconds, over a 15-fold increase.

 

The culprit here is garbage collection activity, which visits far more pages than the application itself [60]

Attached Files




#5056953 Reasons for wanting custom malloc or allocators?

Posted by EmeryBerger on 26 April 2013 - 07:43 AM

I would strongly encourage anyone on this thread to read the paper I wrote on this topic over a decade ago. It just won a Most Influential Paper award but its influence has clearly not spread to this domain...yet.

 

TL;DR - a good malloc is often as fast / faster than your custom allocator because it does the same tricks; "region" allocators can be faster but can leak tons of memory.

 

Title: Reconsidering Custom Memory Allocation (ACM linkdirect PDF linkPowerpoint talk slides), OOPSLA 2002. I've attached the slides in PPT and PDF formats; I highly recommend looking at the PPT version, since it has animations that do not translate well to PDF.

 

Abstract:

 

Programmers hoping to achieve performance improvements often use custom memory allocators. This in-depth study examines eight applications that use custom allocators. Surprisingly, for six of these applications, a state-of-the-art general-purpose allocator (the Lea allocator) performs as well as or better than the custom allocators. The two exceptions use regions, which deliver higher performance (improvements of up to 44%). Regions also reduce programmer burden and eliminate a source of memory leaks. However, we show that the inability of programmers to free individual objects within regions can lead to a substantial increase in memory consumption. Worse, this limitation precludes the use of regions for common programming idioms, reducing their usefulness.We present a generalization of general-purpose and region-based allocators that we call reaps. Reaps are a combination of regions and heaps, providing a full range of region semantics with the addition of individual object deletion. We show that our implementation of reaps provides high performance, outperforming other allocators with region-like semantics. We then use a case study to demonstrate the space advantages and software engineering benefits of reaps in practice. Our results indicate that programmers needing fast regions should use reaps, and that most programmers considering custom allocators should instead use the Lea allocator.

 

StackOverflow discussion here.

 

Attached Files




PARTNERS