Why high level languages are slow

Started by
32 comments, last by Matias Goldberg 9 years, 1 month ago

Destructors are deterministic garbage collectors in the sense that both automatically clean up resources when the program is done with them in a manner that the programmer doesn't have to directly manage (the compiler inserts calls, not the programmer).

By your definition of a GC ("clean up resources so the programmer doesn't have to"), automatic reference counting used in Swift is also GC because it "just works", "compiler does it, not the programmer", yet you just said yourself that Swift has no GC.


Ok, yeah, caught red-handed by my own misuse of the word.

Should have said "Automatic resource management" which reference counting and garbage collection is a part of.

So I stand corrected. "Garbage collection" is a secondary process that scans for unused resources and cleans them up. (Not necessarily memory, but I don't know of any GC that cleans up non-memory resources - at least not directly)

From a performance standpoint, both GC and ARC (not a fan of that acronym, but it works) clean up resources. One does it in-line with code, one does it at some time later (hopefully when the CPU isn't busy).

For the record, I've used systems that added a "deferred deleter" to C++ because the cost of deleting everything at once was too much and it was much better to put the unused objects into a queue to be deleted later in small increments. (A very rudimentary version of GC)

I still contend that if you're going to use a GC language, then you need to play nice with the GC

I'm curious if you have you tried doing this on a large scale?

I've spent an awful lot of time the last couple of years refactoring swathes of Java code (or porting it to C++), to reach the 'zero allocations' bar that one absolutely must hit if one wants a fluid and responsive android application. I'm not kidding about this - you will not be able to reliably hit 60fps if there are any allocations in your rendering or layout loops. And Java doesn't have struct types, so that means no short-lived compound types at all...


I use C# for small scale tools mostly, where I don't have to worry about the GC too much other then knowing how to handle unmanaged resources via IDisposable and "using". Also I tend to avoid generating excess garbage in the first place.

I have had to do one redesign in a real-time project to pool objects rather then allocating them.

So yes, C# leans towards "allocation and pointers". On the flipside, C/C++ leans towards "stack and copying" which has its own problems - most notably, if you take a very large object by value in a function parameter (which is the default in C/C++). Though that is fixed with only a few characters where as C# is more difficult to go the other way.

I'm trying to point out the argument cuts both ways. You can't say that "C# is slow by design" and ignore that the de-facto "high-performance" language (C++) also has slow features (i.e. virtual dispatch).


I would argue: "Everything on the heap" is at the core of C#. Virtual dispatch is not at the core of C++ in the same way.


Virtual dispatch is a pretty core feature of C++'s OOP mechanisms.

And I've seen some pretty ridiculous things written in C++ to avoid the cost (either of the dispatch or of the vtable pointer). For example, an array of small polymorphic objects stored by value where only the object holding the array knows the type of the objects and therefore has to manually do the pointer arithmetic, as well as looking up the correct "vtable" array somewhere else because the cost of adding a vtable pointer to each array element was deemed too high. (Which each type had to register function pointers to on startup)

As I already pointed out - C# does not require the heap for everything, and garbage collected (or at least loosely-pointed) memory has it's own advantages (memory compaction).


It doesn't, but then you're fighting the language. Trying to manage an array of value types comes with significant limitations in C#. True though, memory compaction is a potential advantage. And in practice, if you use an array of reference types and pre-allocate all the objects at the same time, they tend to end up sequential in heap memory anyway - so that does mitigate some of the cache performance issues.


Agreed.

I do not pretend to think C# does not have some bad design decisions (or at least "bad" in some sense of the word, as the designers of the language picked one option out of several and "performance" was not the top driving force, unlike C++).

I simply think that people who try to paint the entire language (or the entire swath of "high level" languages, whatever that means) as "slow" because "it doesn't do this specific thing as fast as this other language" is rather... misinformed. Or at the very least trying to start an argument. But hey, we now have this thread, so I guess they succeeded smile.png (well, this has been more a discussion then an argument)

Programmers have proven time and time again that they are more then willing to sacrifice raw "speed" simply so they can actually make a product that works and isn't a pain to maintain. Otherwise, again, we'd all be doing hand-tuned assembly.
Advertisement

This discussion is soooo old smile.png

The project I work on at my day job is *very* large scale. Some parts of it have hard real-time requirements and are basically written in C with classes. Some parts are C++ because they are compute-bound. But the majority of the components is written in C#, because performance simply isn't the top priority, productivity is.

Use the right tool for the job and be done with it.

Most developers I meet every day are entrenched in "their" world, but I love switching between the worlds and the languages. Spend the morning writing "fast" code, go to lunch and then switch over to writing "nice" code. What really annoys me though, is the fact that everyone and their dog calls themselves a "C# developer" nowadays after reading a few tutorials. It's sad to see so many programmers being completely oblivious of all the nice aspects of the "high level" languages that we willingly paid for by sacrificing performance.

current project: Roa

I'd keep part of the argument in the article, but he takes a really short view of perhaps the last decade.

Nearly all of this discussion can be boiled down into memory management.

Memory management has always been a critical factor of performance. It was a critical factor in the 1950s, in the the 1970s, in the 1990s, and it continues to be a critical factor today in 2015. It has never gone away.

It changes forms every few years. Some years one aspect of memory management is critical, a few years later a different facet is critical, but the issue as a whole is one of the biggest critical factors of software development. I doubt programmers will ever reach a time when memory management is not a key factor, not today, not in a decade, not in a century, probably not in three centuries, and not until computer programming is so fundamentally different that we don't need to know what memory is at all.

CPU cache use is part of memory management.

Load-Hit-Store is part of memory management.

Allocation and release are part of memory management.

Data layout, often argued in terms of AoS versus SoA, is part of memory management.

Locality of data, keeping things near the right chip or on the right card, is part of memory management.

Serialization is memory management.

Memory management has always been a key factor of computer programming, and it will continue to be a key factor for the foreseeable future.

The article's issue with garbage collectors is just a subset of the core issue of memory management. Garbage collection routines CAN be slow, but they can also be very fast. Under some circumstances and models, they can even be completely free by taking place during otherwise idle time. But misuse them and they can be a terrible performance problem.

As was pointed out, zero allocations during rendering is a common bar and one that I've helped push on several games. That is not a problem of the garbage collection inherently, that is a problem of various programmers on various modules who made assumptions about memory management that were wrong. Those programmers wrongly assumed that they could allocate memory freely, when in fact there are times when memory allocations should be completely forbidden.

The farther away from the hardware you get, or the broader your abstractions become, the less direct control you have over memory management. When some library that is abstracted away from you by twenty layers starts doing unexpected memory management patterns, be that anything from an included container class that triggers a reallocation all the way out to a network driver that performs allocations every time a packet comes through, those assumptions about memory management can trigger problems.

I mention the network because that's an example I've been facing very recently. We've hit a case recently where a network driver -- something completely out of our application's control -- is occasionally stumbling in performance due to stupid memory management. Internally the driver is occasionally pausing when data arrives, almost certainly pausing to allocate memory or expand memory pools. Poor memory management in a depended-on component outside our control is a chain reaction causing performance problems to our application.

Memory management is a concern of all programmers, both those concerned about performance or about ease of development need to keep it in mind.

And in practice, if you use an array of reference types and pre-allocate all the objects at the same time, they tend to end up sequential in heap memory anyway - so that does mitigate some of the cache performance issues.


But do they?
Do they ALWAYS?
Do you have any guarantee for this?
What hoops do you have to jump to make sure? (Pre-allocate and initialise all. Never replace. Always copy in. Never grow. I'm guessing those constraints at least to maybe get this).

Which is the point of the article/blog; you are already fighting the language and the GC to try and maybe get a certain layout, perhaps.

C++; vector<foo>(size) - job done.

Now, for many and many problems that isn't an issue but it is important to know that it could be an issue, that you have no guarantees, and that even if you do get your layout you could well be hurting anyway because your structures might be bigger than you think (I couldn't tell you the memory layout of a .Net object so I couldn't say 100%) and come with more access cost (ref vs dir) and other things which, in the situation discussed, will make things slow.

(There was an article, I've lost the link to it now, which was talking about efficacy, big-O, vector vs list and their performance impacts. The long and the short of it was for million items, for sorted insertion followed by iteration vector was ALWAYS faster than list by a pretty large margin. Preallocation of elements were done. A few other tricks were pulled. But as the list size increased vector continued to out perform list all the time. Cache is king.)


I simply think that people who try to paint the entire language (or the entire swath of "high level" languages, whatever that means) as "slow" because "it doesn't do this specific thing as fast as this other language" is rather... misinformed. Or at the very least trying to start an argument. But hey, we now have this thread, so I guess they succeeded smile.png (well, this has been more a discussion then an argument)


The painting however was done with context, in a particular set situation of a memory bound operation. In that situation C#/.Net is slow.

This is a fact. It simply has too much working against it.

And that's ok, because anyone reading it with an ounce of 'best tool for the job' will read that, nod, and then continue to use it if it is the best tool for the job.

It might look like I'm arguing C++'s corner vs C# like some rabid fanboy but I'm not.
I think C# and the .Net family of languages are great. If I have a job I think suits them then I'll reach for them right away; hell the only reason I don't reach for F# is because I've not had enough use of it in anger to get a complete handle on the language.

But if I'm doing high performance memory traffic heavy code then you'd better believe I'm reaching for C++ because it simply gives you better control and in that situation is faster.
(OK, to be fair, if the work can be heavily parallelised then I'm probably reaching for some form of compute shader and a GPU but you get my point.)

Trying to argue that this isn't "true" or isn't fair because your language of choice happens to be a problem in the situation pointed out... well... *shrugs*

Both C++ and C# give up programmer flexibility in the hopes that the compiler is better at writing the low-level stuff then we are.

C++ compilers have been around long enough that this is generally true. But people still drop down to hand-written assembly for certain tasks that the compiler isn't so great at.

C# is in the same boat: "good enough" for most people, with the option to drop into "lower-level" languages (like C++) when necessary.

The only time your handwritten asm will be faster is if you have more information about what needs to be done than what the compiler can deduce. And again when you go to this level you are working arround the langauge and are no longer writing idomatic code.

I'm not so sure the gap will continue - at least not in the same manner. Yes, memory is much slower then a CPU, but raw CPU speeds haven't increased in years. What the issue is now is feeding enough memory to multiple CPUs.

CPU speeds have actually increased again since we got the Haswells we are not regularly seeing CPUs over 3Ghz, which was unheard of 3 years ago, and when the dual and quads came out they all dropped the speed to arround 2Ghz again. Yes its been slower to go up again but the gap really exists and is a problem.

This gap is what prompted the data oriented design streams in programming, and you can do this to a fashion in C# too but C++ allows this a lot easier.

The feeding of the multiple cores is actually also a problem, but when people are talking about the memory cycle gap between cache and memory that is generated in single threaded applications, so the additional cores dont matter.

And CPU's try to hide this stuff anyway by prefetching memory and that effectively only really works well if all the memory you are operating over is local too each other. Herb sutter has a really nice presenation about this where you can see where the performance platues are in different access strategies. http://channel9.msdn.com/Events/Build/2014/2-661 skip to 35 minutes and you can see the graphs for memory access.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion


But as the list size increased vector continued to out perform list all the time. Cache is king.

Of course it is. Because cache is currently among the most important facets of memory management.

For the analysis of those algorithms most people use the really old 1960's models. Under those models memory has uniform performance.

Memory has had non-uniform performance in mainstream computers since the late 1970s. In the 1980s that moved on to desktop computers. Companies like Silicon Graphics relied on a unified memory architecture that worked well for their graphics workstations in the era. A small number of computer vendors in the early 1980s tried a few variations of cache-only memory architecture, but mostly they didn't pan out too well. From time to time that option gets pulled back out by hardware vendors (e.g. Sun's WildFire) but usually only has limited application.

When you start talking "big O", the theory is that you are not working at a level where cache effects are reasonable. This made a lot of sense back when cache was measured in bytes or kilobytes, but when you've got high end machines with 128 MB on-die cache, it gets a bit harder to ignore.

(Yes, the upcoming Skylake processor series is expected to have 128MB of L4 CPU cache on the highest sever-level processors. That blew my mind the first time I read it.)

If you're applying a caching model model that assumes bytes of cache but a real-world hardware that uses megabytes of cache, don't expect performance to map very well.

But that boils back down to memory management.

You need to manage your memory in a way that cooperates with the hardware's memory and your problem space.

You cannot assume that all memory has the same performance characteristics, because it does not. That may mean a difference between mag core versus transistor memory debate in the 1980s, or today's levels of CPU cache versus DRAM chips.

So cache is king because cache is high speed and main memory is slow. Just like in comparison disk is slow and main memory is fast, or spinning platters are slow and SSD is fast. Fast is better, so use the fastest available to you.

Getting back to why "high level languages" are slow....

For some computational problems, the garbage collection solution is wonderful. If your entire problem space is based around small objects constantly being created and destroyed, where allocation and clean up are not bottlenecks or taking place during bottlenecks, then the GC model can be wonderful. It works incredibly well for many business solutions because the problem fits the solution.

I've worked on quite a few games now that use C# through Mono for game scripting. Keep it far away from rendering where it is a terrible fit, but for scripting of game objects it can be an amazingly good fit with a great side benefit of rapid development.

You just can't do stupid things with it. Others earlier mentioned the foreach problem, not only do they cause allocation but also because it can be tricky to serialize if you want to do something like save your game. (Yeah, crazy these days when the focus tends to be online-only play.)

Unfortunately to know what qualifies as a "stupid thing" you need to know what the high level language is doing internally. If that language is a scripted language you need to understand the costs of allocations, the costs of virtual function calls, the costs of features. Even if that language is C++, you still need to understand what is happening in a vector or a list or a map, and even the cost associated with the seven letters "virtual".

...hell the only reason I don't reach for F# is because I've not had enough use of it in anger to get a complete handle on the language.


The main reason I don't use F# is actually the must-define-before-use constraint on types and methods. That used to be a mild pain in C and C++ that you could work around with forward declarations. Then I used C# and it doesn't need any ordering or forwarding. Then I tried F# and immediately got stabbed in the face with even worse declaration ordering than C and C++ had, and no apparent way to forward things nicely either (you could mutually define functions, but nobody wants to chain the ENTIRE project together that way).

I hope they've changed this since I last used the language, because it was a pretty damn cool language other than that.

TL;DR: Most of the things that piss me off about languages have very little to do with their memory management, and everything to do with "nuisance" constraints caused by the compilation model.
Why do people get so riled up over these kinds of discussions?
We have many choices in languages for a reason. The ones that don’t fulfill a necessary niche are dead, so by definition if your language-of-choice is widely used then it has a role that it is fulfilling, possibly better than any other language.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

(There was an article, I've lost the link to it now, which was talking about efficacy, big-O, vector vs list and their performance impacts. The long and the short of it was for million items, for sorted insertion followed by iteration vector was ALWAYS faster than list by a pretty large margin. Preallocation of elements were done. A few other tricks were pulled. But as the list size increased vector continued to out perform list all the time. Cache is king.)


I'm fairly certain this has less to do with cache than with the inherent performance characteristics of vectors and lists.

First off, iterating over a vector is usually faster and never slower because for vectors you just need to increment a CPU register to advance to the next item whereas for a list you need to load a new pointer from RAM. And this is performace change that goes unreflected in Big-O notation (iterating over both vectors and lists is O(n).)

To do a sorted insert into a list, you MUST iterate over the list to find the insertion point, which is O(n), and then do the actual instertion (which is constant-time). But because vectors have constant-time random access, the insertion point can be found via binary search, which is O(log n), and then the insertion is O(n).

However, the slow part of vector insertion is moving all the elements after the insertion point, which is just a simple memcpy (EDIT: assuming POD types) and is still faster than iterating over the elements of a list to find the insertion point - resizes are a little bit slower (but still O(n)) but infrequent, and never occur if the vector is pre-allocated.

All told, assuming the data in question is a simple integral type, a naive sorted-insert-and-iterate implementation is 3n+2 memory accesses for lists versus [2+1/(2 log n)]n+1 accesses for vectors, and an optimal one is 2n+2 accesses for lists and 1.5n+log n +1 accesses for vectors - vector is inherently faster for this test regardless of any cache-related concerns. Cache coherency issues will only magnify this difference - however increased object size will mitigate it.

However, the slow part of vector insertion is moving all the elements after the insertion point, which is just a simple memcpy and is still faster than iterating over the elements of a list to find the insertion point - resizes are a little bit slower (but still O(n)) but infrequent, and never occur if the vector is pre-allocated.

I thought with vectors any movement of the elements would result in the copy or move constructors being called, as well?

This topic is closed to new replies.

Advertisement