Sign in to follow this  
NightCreature83

Why high level languages are slow

Recommended Posts

I don't recall him saying he hates anything, he was just pointing out why languages like C# and Java with their design choices are slower than a language like C++ which pushes more control back to the developer are the way they are performance wise.
More to the point he points out that for many people they either don't care or it isn't a problem.


I'm trying to point out the argument cuts both ways. You can't say that "C# is slow by design" and ignore that the de-facto "high-performance" language (C++) also has slow features (i.e. virtual dispatch).

And really, the only way you're going to get any kind of definitive answer is to somehow come up with a giant list of language features and then point out "this one is slow" and "this one is fast". And while such a comparison may be useful in some circles, it generally isn't going to matter in most cases.
 

And there is no getting around the fact he is right.

C# with it's heap-for-all-the-things and various other things will cause you cache misses.
GCs can cause horrible problems with cache lines and unpredictable runtime issues.


As I already pointed out - C# does not require the heap for everything, and garbage collected (or at least loosely-pointed) memory has it's own advantages (memory compaction).
 

End of the day languages have their pros and cons; I see nothing 'wrong' in what he said and nor did he conclude that 'high level languages are bad' just that you should be aware of things and why they are as they are.

 
Which was kind of my point. He should have titled his post "Garbage collection and pointers can cause performance issues". But instead he decided to make a generalized sweeping statement over a large swath of programming languages, several of which don't even have the features he's deriding them for.
 

First, the elephant in the room: C++ is a high-level language. (Shock! Horror! Let's stop using it!) As is C. As is Pascal. As is COBOL. And the list goes on...

If someone implies that C++ isn't high level, but C# is... then chances are they're using a different definition than you are... so instead of shouting "WRONG!" at them, it's better to assume good faith and adopt their terminology for the purpose of understanding their point.
In school we first learnt that you had a hierarchy consisting of: machine language, assembly languages, high level languages, and domain specific languages.
Later we learnt to categorize them, taking classes in systems programming in C, or application programming in Java. You learn that Java is more high leveler than C...
It's pretty obvious he's comparing C#/Java/Python-esque languages to C/FORTRAN/Pascal-esque languages.


Once you get into the article, sure. And yes, we have bad nomenclature in our field, as a "high level language" is anything that isn't assembly, for the most part.

Which is part of why I take objection to his article. The title is click-baity.
 

He may have part of a point here simply because C# gives you less control then C++ - but that's kind of the point. You're letting the runtime determine more things for you because you think you can afford the potential cost as a tradeoff for development time and lack of bugs.

Just saying that productivity trumps performance is changing the subject, is it not? In a discussion about how language design affects performance, the impacts of your design choices in other areas are interesting, but not the topic at hand.
It's possible to come up with choices that let you have your productivity-cake and eat your manual-cache-performance too, like C#'s StructLayoutAttribute, but that then enters the argument about having to fight the language and use non-idiomatic means to regain said cake eating.


Point conceded.

Probably got too riled up over the tone and broad generalizations of the article.
 

Can you write faster code in C++? Yes. If you have a few extra years.
You can also write faster code in assembler, but you don't see people screaming about how inefficient C++ is and telling everyone to use that, now do you?

It's not that simple.
C++ is a less productive (harder) language, so an inexperienced C# team might take longer to produce a product if forced to use C++, and their product may also be slower and buggier, than what they would've produced in C#.
Meanwhile, a team of trained C++ veterans might produce a product in C++ much faster than they would in C#, and also win on performance.
 
Just because C++ has lots of performance tools at your disposal, it doesn't mean that people use them correctly. Most people don't. But they're still useful tools for craftsman to have at their disposal.
 
As for assembler, when doing systems/embedded programming, usually you have a very good idea of what the resulting assembly will look like. You basically ARE writing the program in assembly, using C/C++ as your assembler tongue.png If the compiled code doesn't line up with your expectations, then you tweak the your code so that the compiler does produce what you expected of it. It's low-level thinking in a high-level language.
 
Moreover, the compiler is usually better at asm than us, so it takes our intentions and cleverly optimizes them better than we ever could, using it's super-human knowledge of instruction scheduling, dependency chain tracking, register juggling abilities, etc...


Both C++ and C# give up programmer flexibility in the hopes that the compiler is better at writing the low-level stuff then we are.

C++ compilers have been around long enough that this is generally true. But people still drop down to hand-written assembly for certain tasks that the compiler isn't so great at.

C# is in the same boat: "good enough" for most people, with the option to drop into "lower-level" languages (like C++) when necessary.
 

Compilers are getting smarter all the time - and a higher level language is generally better at expressing programmer intent which allows the compiler to make optimizations it might otherwise have been unable to do.

My bolded statement above does not apply to the behavior of C#/Java/C++ code when it comes to smart data layout. C/C++ sucks at this too!
None of these languages have the power to automatically fix structural problems... which is kind of the point. There is no magic "fix my cache misses" algorithm.
It comes down to the language having tools to better allow you to describe the structure of the data.
Things like ispc's soa keyword, or the scope-stack allocation pattern, or Jai's "joint allocations", are, unfortunately, new language developments, but are ideas that have been developed manually forever (in languages that allow you to do such things manually, such as C)...
 
As for memory management, the dichotomy of a C-style manual free-for-all vs a C#-style catch-all GC is false. There's so, so many ways to manage resources, but not that many with language support.
It's also a two-way street. I've shipped lots of games that use GC's, and had to deal with all the show-stopping bugs caused by the GC that delayed shipping dates.
e.g. the GC is consuming too much CPU time per frame, blowing the performance budget. Ok, do a quick fix by capping how long it can run per frame, easy! The memory usage is now growing steadily until we run out and crash... Fix properly it by fighting the language, doing a pass over the ENTIRE code-base, and micro-optimizing everywhere to eliminate temporary objects.
 
A lot games I've worked on have also successfully used stack-based allocations entirely (no malloc/new/gc)... There's endless options here, 99% of which have no language support. The GC pattern isn't the one true solution to rule them all. The point is that high(er) level languages have already made more choices for you, deciding which tools you're allowed to use, and which ones of them are running at full voltage.
The simpler a language is, the easier it is to take a different evolutionary path through the tree of data management ideas. New languages will hopefully be able give us the productivity gains of C#/Java, but the efficiency that we currently can only achieve by doing manual data juggling in C... This advancement will be similar to the advancement from asm-programming to C-programming, but for structure this time, not logic biggrin.png


Also very good points.

I still contend that if you're going to use a GC language, then you need to play nice with the GC. Just like if you're using a manual memory managed language, you have to play nice. But it's a different style of "play nice".

I think the real "issue" here is that C++ can have a GC added to it (using a variety of schemes), but you can't take out the GC from C#.

On the flipside - GC has yet to figure out how to manage "non memory" resources like network connections or file handles. C# provides a (clunky) way of handling it via "using" and "IDisposable". Java (AFAIK) has no such options.
 

To use made up numbers - CPU performance doubles every 2 years, but memory performance doubles every 10. This means that after 10 years, your CPU is 32x faster, your RAM is 10x faster. If you normalize that for CPU power, what's happened is that your RAM has gotten ~70% slower!
Every year, instructions-per-memory-fetch just goes up and up. C#/Java were born out of the era of 90's programming when this issue wasn't as much of a big deal. This decade it's almost the defining problem of software performance... and none of our languages are built to deal with it.


I'm not so sure the gap will continue - at least not in the same manner. Yes, memory is much slower then a CPU, but raw CPU speeds haven't increased in years. What the issue is now is feeding enough memory to multiple CPUs.

Fun fact: C++ has garbage collection. Though it goes by a different name - "destructors".

Funnier fact: No, it does not, but one can be optionally plugged into it. By GC people (ie. Bjarne Stroustrup[1] or BoehmGC authors[2], on their respective websites) mean a mechanism that automatically frees memory that is no longer accessible by the program being ran. C++ does not have that, Java and C# do. You are the first person I've ever seen claiming that destructors == garbage collection.

1. http://www.stroustrup.com/bs_faq.html#garbage-collection
2. ( http://www.hboehm.info/gc/ links to) http://www.iecc.com/gclist/GC-faq.html#Common%20questions


Destructors are deterministic garbage collectors in the sense that both automatically clean up resources when the program is done with them in a manner that the programmer doesn't have to directly manage (the compiler inserts calls, not the programmer). Edited by SmkViper

Share this post


Link to post
Share on other sites

Destructors are deterministic garbage collectors in the sense that both automatically clean up resources when the program is done with them in a manner that the programmer doesn't have to directly manage (the compiler inserts calls, not the programmer).

 

By your definition of a GC ("clean up resources so the programmer doesn't have to"), automatic reference counting used in Swift is also GC because it "just works", "compiler does it, not the programmer", yet you just said yourself that Swift has no GC.

Share this post


Link to post
Share on other sites

I still contend that if you're going to use a GC language, then you need to play nice with the GC

I'm curious if you have you tried doing this on a large scale?
 
I've spent an awful lot of time the last couple of years refactoring swathes of Java code (or porting it to C++), to reach the 'zero allocations' bar that one absolutely must hit if one wants a fluid and responsive android application. I'm not kidding about this - you will not be able to reliably hit 60fps if there are any allocations in your rendering or layout loops. And Java doesn't have struct types, so that means no short-lived compound types at all...

Share this post


Link to post
Share on other sites


I'm trying to point out the argument cuts both ways. You can't say that "C# is slow by design" and ignore that the de-facto "high-performance" language (C++) also has slow features (i.e. virtual dispatch).

 

I would argue: "Everything on the heap" is at the core of C#. Virtual dispatch is not at the core of C++ in the same way.

 


As I already pointed out - C# does not require the heap for everything, and garbage collected (or at least loosely-pointed) memory has it's own advantages (memory compaction).

 

It doesn't, but then you're fighting the language. Trying to manage an array of value types comes with significant limitations in C#. True though, memory compaction is a potential advantage. And in practice, if you use an array of reference types and pre-allocate all the objects at the same time, they tend to end up sequential in heap memory anyway - so that does mitigate some of the cache performance issues.

Share this post


Link to post
Share on other sites

Destructors are deterministic garbage collectors in the sense that both automatically clean up resources when the program is done with them in a manner that the programmer doesn't have to directly manage (the compiler inserts calls, not the programmer).

By your definition of a GC ("clean up resources so the programmer doesn't have to"), automatic reference counting used in Swift is also GC because it "just works", "compiler does it, not the programmer", yet you just said yourself that Swift has no GC.


Ok, yeah, caught red-handed by my own misuse of the word.

Should have said "Automatic resource management" which reference counting and garbage collection is a part of.

So I stand corrected. "Garbage collection" is a secondary process that scans for unused resources and cleans them up. (Not necessarily memory, but I don't know of any GC that cleans up non-memory resources - at least not directly)

From a performance standpoint, both GC and ARC (not a fan of that acronym, but it works) clean up resources. One does it in-line with code, one does it at some time later (hopefully when the CPU isn't busy).

For the record, I've used systems that added a "deferred deleter" to C++ because the cost of deleting everything at once was too much and it was much better to put the unused objects into a queue to be deleted later in small increments. (A very rudimentary version of GC)
 

I still contend that if you're going to use a GC language, then you need to play nice with the GC

I'm curious if you have you tried doing this on a large scale?
 
I've spent an awful lot of time the last couple of years refactoring swathes of Java code (or porting it to C++), to reach the 'zero allocations' bar that one absolutely must hit if one wants a fluid and responsive android application. I'm not kidding about this - you will not be able to reliably hit 60fps if there are any allocations in your rendering or layout loops. And Java doesn't have struct types, so that means no short-lived compound types at all...


I use C# for small scale tools mostly, where I don't have to worry about the GC too much other then knowing how to handle unmanaged resources via IDisposable and "using". Also I tend to avoid generating excess garbage in the first place.

I have had to do one redesign in a real-time project to pool objects rather then allocating them.

So yes, C# leans towards "allocation and pointers". On the flipside, C/C++ leans towards "stack and copying" which has its own problems - most notably, if you take a very large object by value in a function parameter (which is the default in C/C++). Though that is fixed with only a few characters where as C# is more difficult to go the other way.
 

I'm trying to point out the argument cuts both ways. You can't say that "C# is slow by design" and ignore that the de-facto "high-performance" language (C++) also has slow features (i.e. virtual dispatch).

 
I would argue: "Everything on the heap" is at the core of C#. Virtual dispatch is not at the core of C++ in the same way.


Virtual dispatch is a pretty core feature of C++'s OOP mechanisms.

And I've seen some pretty ridiculous things written in C++ to avoid the cost (either of the dispatch or of the vtable pointer). For example, an array of small polymorphic objects stored by value where only the object holding the array knows the type of the objects and therefore has to manually do the pointer arithmetic, as well as looking up the correct "vtable" array somewhere else because the cost of adding a vtable pointer to each array element was deemed too high. (Which each type had to register function pointers to on startup)
 

As I already pointed out - C# does not require the heap for everything, and garbage collected (or at least loosely-pointed) memory has it's own advantages (memory compaction).

 
It doesn't, but then you're fighting the language. Trying to manage an array of value types comes with significant limitations in C#. True though, memory compaction is a potential advantage. And in practice, if you use an array of reference types and pre-allocate all the objects at the same time, they tend to end up sequential in heap memory anyway - so that does mitigate some of the cache performance issues.


Agreed.

I do not pretend to think C# does not have some bad design decisions (or at least "bad" in some sense of the word, as the designers of the language picked one option out of several and "performance" was not the top driving force, unlike C++).

I simply think that people who try to paint the entire language (or the entire swath of "high level" languages, whatever that means) as "slow" because "it doesn't do this specific thing as fast as this other language" is rather... misinformed. Or at the very least trying to start an argument. But hey, we now have this thread, so I guess they succeeded smile.png (well, this has been more a discussion then an argument)

Programmers have proven time and time again that they are more then willing to sacrifice raw "speed" simply so they can actually make a product that works and isn't a pain to maintain. Otherwise, again, we'd all be doing hand-tuned assembly. Edited by SmkViper

Share this post


Link to post
Share on other sites

This discussion is soooo old smile.png

 

The project I work on at my day job is *very* large scale. Some parts of it have hard real-time requirements and are basically written in C with classes. Some parts are C++ because they are compute-bound. But the majority of the components is written in C#, because performance simply isn't the top priority, productivity is.

 

Use the right tool for the job and be done with it.

 

Most developers I meet every day are entrenched in "their" world, but I love switching between the worlds and the languages. Spend the morning writing "fast" code, go to lunch and then switch over to writing "nice" code. What really annoys me though, is the fact that everyone and their dog calls themselves a "C# developer" nowadays after reading a few tutorials. It's sad to see so many programmers being completely oblivious of all the nice aspects of the "high level" languages that we willingly paid for by sacrificing performance.

Share this post


Link to post
Share on other sites

And in practice, if you use an array of reference types and pre-allocate all the objects at the same time, they tend to end up sequential in heap memory anyway - so that does mitigate some of the cache performance issues.


But do they?
Do they ALWAYS?
Do you have any guarantee for this?
What hoops do you have to jump to make sure? (Pre-allocate and initialise all. Never replace. Always copy in. Never grow. I'm guessing those constraints at least to maybe get this).

Which is the point of the article/blog; you are already fighting the language and the GC to try and maybe get a certain layout, perhaps.

C++; vector<foo>(size) - job done.

Now, for many and many problems that isn't an issue but it is important to know that it could be an issue, that you have no guarantees, and that even if you do get your layout you could well be hurting anyway because your structures might be bigger than you think (I couldn't tell you the memory layout of a .Net object so I couldn't say 100%) and come with more access cost (ref vs dir) and other things which, in the situation discussed, will make things slow.

(There was an article, I've lost the link to it now, which was talking about efficacy, big-O, vector vs list and their performance impacts. The long and the short of it was for million items, for sorted insertion followed by iteration vector was ALWAYS faster than list by a pretty large margin. Preallocation of elements were done. A few other tricks were pulled. But as the list size increased vector continued to out perform list all the time. Cache is king.)
 
 

I simply think that people who try to paint the entire language (or the entire swath of "high level" languages, whatever that means) as "slow" because "it doesn't do this specific thing as fast as this other language" is rather... misinformed. Or at the very least trying to start an argument. But hey, we now have this thread, so I guess they succeeded smile.png (well, this has been more a discussion then an argument)


The painting however was done with context, in a particular set situation of a memory bound operation. In that situation C#/.Net is slow.

This is a fact. It simply has too much working against it.

And that's ok, because anyone reading it with an ounce of 'best tool for the job' will read that, nod, and then continue to use it if it is the best tool for the job.

It might look like I'm arguing C++'s corner vs C# like some rabid fanboy but I'm not.
I think C# and the .Net family of languages are great. If I have a job I think suits them then I'll reach for them right away; hell the only reason I don't reach for F# is because I've not had enough use of it in anger to get a complete handle on the language.

But if I'm doing high performance memory traffic heavy code then you'd better believe I'm reaching for C++ because it simply gives you better control and in that situation is faster.
(OK, to be fair, if the work can be heavily parallelised then I'm probably reaching for some form of compute shader and a GPU but you get my point.)

Trying to argue that this isn't "true" or isn't fair because your language of choice happens to be a problem in the situation pointed out... well... *shrugs*

Share this post


Link to post
Share on other sites

 

Both C++ and C# give up programmer flexibility in the hopes that the compiler is better at writing the low-level stuff then we are.

C++ compilers have been around long enough that this is generally true. But people still drop down to hand-written assembly for certain tasks that the compiler isn't so great at.

C# is in the same boat: "good enough" for most people, with the option to drop into "lower-level" languages (like C++) when necessary.
 

 

The only time your handwritten asm will be faster is if you have more information about what needs to be done than what the compiler can deduce. And again when you go to this level you are working arround the langauge and are no longer writing idomatic code.

 

I'm not so sure the gap will continue - at least not in the same manner. Yes, memory is much slower then a CPU, but raw CPU speeds haven't increased in years. What the issue is now is feeding enough memory to multiple CPUs.

 

 

 

 

CPU speeds have actually increased again since we got the Haswells we are not regularly seeing CPUs over 3Ghz, which was unheard of 3 years ago, and when the dual and quads came out they all dropped the speed to arround 2Ghz again. Yes its been slower to go up again but the gap really exists and is a problem.

This gap is what prompted the data oriented design streams in programming, and you can do this to a fashion in C# too but C++ allows this a lot easier.

The feeding of the multiple cores is actually also a problem, but when people are talking about the memory cycle gap between cache and memory that is generated in single threaded applications, so the additional cores dont matter.

And CPU's try to hide this stuff anyway by prefetching memory and that effectively only really works well if all the memory you are operating over is local too each other. Herb sutter has a really nice presenation about this where you can see where the performance platues are in different access strategies. http://channel9.msdn.com/Events/Build/2014/2-661 skip to 35 minutes and you can see the graphs for memory access.

Share this post


Link to post
Share on other sites


But as the list size increased vector continued to out perform list all the time. Cache is king.

Of course it is.  Because cache is currently among the most important facets of memory management.

 

 

 

For the analysis of those algorithms most people use the really old 1960's models.  Under those models memory has uniform performance.

 

Memory has had non-uniform performance in mainstream computers since the late 1970s. In the 1980s that moved on to desktop computers. Companies like Silicon Graphics relied on a unified memory architecture that worked well for their graphics workstations in the era.  A small number of computer vendors in the early 1980s tried a few variations of cache-only memory architecture, but mostly they didn't pan out too well.  From time to time that option gets pulled back out by hardware vendors (e.g. Sun's WildFire) but usually only has limited application.

 

When you start talking "big O", the theory is that you are not working at a level where cache effects are reasonable.  This made a lot of sense back when cache was measured in bytes or kilobytes, but when you've got high end machines with 128 MB on-die cache, it gets a bit harder to ignore.  

 

(Yes, the upcoming Skylake processor series is expected to have 128MB of L4 CPU cache on the highest sever-level processors.  That blew my mind the first time I read it.)

 

 

 

If you're applying a caching model model that assumes bytes of cache but a real-world hardware that uses megabytes of cache, don't expect performance to map very well.

 

 

 

 

But that boils back down to memory management.

 

You need to manage your memory in a way that cooperates with the hardware's memory and your problem space. 

 

You cannot assume that all memory has the same performance characteristics, because it does not. That may mean a difference between mag core versus transistor memory debate in the 1980s, or today's levels of CPU cache versus DRAM chips.

 

So cache is king because cache is high speed and main memory is slow.  Just like in comparison disk is slow and main memory is fast, or spinning platters are slow and SSD is fast.  Fast is better, so use the fastest available to you.

 

 

 

Getting back to why "high level languages" are slow....

 

For some computational problems, the garbage collection solution is wonderful. If your entire problem space is based around small objects constantly being created and destroyed, where allocation and clean up are not bottlenecks or taking place during bottlenecks, then the GC model can be wonderful. It works incredibly well for many business solutions because the problem fits the solution. 

 

I've worked on quite a few games now that use C# through Mono for game scripting.  Keep it far away from rendering where it is a terrible fit, but for scripting of game objects it can be an amazingly good fit with a great side benefit of rapid development.

 

You just can't do stupid things with it.  Others earlier mentioned the foreach problem, not only do they cause allocation but also because it can be tricky to serialize if you want to do something like save your game. (Yeah, crazy these days when the focus tends to be online-only play.)

 

 

 

Unfortunately to know what qualifies as a "stupid thing" you need to know what the high level language is doing internally. If that language is a scripted language you need to understand the costs of allocations, the costs of virtual function calls, the costs of features. Even if that language is C++, you still need to understand what is happening in a vector or a list or a map, and even the cost associated with the seven letters "virtual".

Share this post


Link to post
Share on other sites

...hell the only reason I don't reach for F# is because I've not had enough use of it in anger to get a complete handle on the language.


The main reason I don't use F# is actually the must-define-before-use constraint on types and methods. That used to be a mild pain in C and C++ that you could work around with forward declarations. Then I used C# and it doesn't need any ordering or forwarding. Then I tried F# and immediately got stabbed in the face with even worse declaration ordering than C and C++ had, and no apparent way to forward things nicely either (you could mutually define functions, but nobody wants to chain the ENTIRE project together that way).

I hope they've changed this since I last used the language, because it was a pretty damn cool language other than that.

TL;DR: Most of the things that piss me off about languages have very little to do with their memory management, and everything to do with "nuisance" constraints caused by the compilation model. Edited by Nypyren

Share this post


Link to post
Share on other sites

(There was an article, I've lost the link to it now, which was talking about efficacy, big-O, vector vs list and their performance impacts. The long and the short of it was for million items, for sorted insertion followed by iteration vector was ALWAYS faster than list by a pretty large margin. Preallocation of elements were done. A few other tricks were pulled. But as the list size increased vector continued to out perform list all the time. Cache is king.)


I'm fairly certain this has less to do with cache than with the inherent performance characteristics of vectors and lists.

 

First off, iterating over a vector is usually faster and never slower because for vectors you just need to increment a CPU register to advance to the next item whereas for a list you need to load a new pointer from RAM. And this is performace change that goes unreflected in Big-O notation (iterating over both vectors and lists is O(n).)

 

To do a sorted insert into a list, you MUST iterate over the list to find the insertion point, which is O(n), and then do the actual instertion (which is constant-time). But because vectors have constant-time random access, the insertion point can be found via binary search, which is O(log n), and then the insertion is O(n).

 

However, the slow part of vector insertion is moving all the elements after the insertion point, which is just a simple memcpy (EDIT: assuming POD types) and is still faster than iterating over the elements of a list to find the insertion point - resizes are a little bit slower (but still O(n)) but infrequent, and never occur if the vector is pre-allocated.

 

All told, assuming the data in question is a simple integral type, a naive sorted-insert-and-iterate implementation is 3n+2 memory accesses for lists versus [2+1/(2 log n)]n+1 accesses for vectors, and an optimal one is 2n+2 accesses for lists and 1.5n+log n +1 accesses for vectors - vector is inherently faster for this test regardless of any cache-related concerns. Cache coherency issues will only magnify this difference - however increased object size will mitigate it.

Edited by Anthony Serrano

Share this post


Link to post
Share on other sites

 

However, the slow part of vector insertion is moving all the elements after the insertion point, which is just a simple memcpy and is still faster than iterating over the elements of a list to find the insertion point - resizes are a little bit slower (but still O(n)) but infrequent, and never occur if the vector is pre-allocated.

 

 

I thought with vectors any movement of the elements would result in the copy or move constructors being called, as well?

Share this post


Link to post
Share on other sites

*ends up skipping most of the thread*

 

Honestly the article seems to be pointing out a single problem that I'm honestly tired of: everything gets allocated in the heap. Depending on the languages even simple integers can end up like this. This alone will bring down the CPU to its knees, constantly allocating and deallocating, cache coherency getting completely destroyed, the GC going crazy having to deallocate stuff all the time, etc.

 

Also wow that Haswell chart. I just made a quick calculation, a program that's constantly causing cache misses would be easily brought down to the equivalent of sub-100MHz speeds (cache miss every access would be 13MHz on a 3GHz Haswell, but I imagine the code to execute has cache coherency =P). No wonder so many modern programs seem to be slow despite the fact computers should be much faster. This is probably a bullshit calculation honestly but it would explain a lot.

 


Moreover, the compiler is usually better at asm than us, so it takes our intentions and cleverly optimizes them better than we ever could, using it's super-human knowledge of instruction scheduling, dependency chain tracking, register juggling abilities, etc...

Yes.

 

I've once written some simple matrix multiplication code in the laziest way possible (double for loop) in order to make the code as simple as possible. Decided to pass it through GCC with optimizations at maximum to see what it'd do. Cue the entire thing having been optimized away into an unrolled loop full of SIMD instructions (also having crammed in two matrices entirely into registers). Then I've decided to do transformation functions, but just passing the matrices as-is instead of trying to optimize them out (since I know most calculations become redundant, due to lots of 0s and 1s). So GCC completely inlined the matrix multiplication function, then optimized out that inlined function taking into account the constant values (basically, doing the very thing I refused to do in the source code).

 

Now, that was C, but basically don't underestimate the compiler. Just make sure the code is reasonably simple enough that the compiler will be able to catch the idiom. This is the code, if somebody wonders.

Share this post


Link to post
Share on other sites

 

 

However, the slow part of vector insertion is moving all the elements after the insertion point, which is just a simple memcpy and is still faster than iterating over the elements of a list to find the insertion point - resizes are a little bit slower (but still O(n)) but infrequent, and never occur if the vector is pre-allocated.

 

 

I thought with vectors any movement of the elements would result in the copy or move constructors being called, as well?

 

 

Should have mentioned that it's all assuming POD types.

Share this post


Link to post
Share on other sites

In the end it is not languages that are fast or slow, it is implementations that are fast or slow.

 

It all comes down to algorithmic complexity. I've seen really badly written C++ that brings a system to it's knees because the developer did not know the difference between a std::hash_map and a std::list.

 

Similarly, ive seen extremely well written C# and Java applications written for enterprise use, which kick the ass of anything written in C or C++ simply because of how quickly they can be iteratively developed by a team, and time is of course money.

 

In the end, you need to use the right tool for the job, sometimes this is an interpreted or bytecode language and other times this might even be [i]assembly language[/i], but the biggest mistake is always selecting the wrong tool for the job because it's the only tool you know how to use. As a wise man once said, "[b]If all you have is a hammer, everything looks like nail[/b]".

Share this post


Link to post
Share on other sites
But the language informs implementation - the way the C# language and associated runtime function requires compromises which, when writing idiomatic code which does not fight the language, results in a slower performance level. These are language design choices directly impacting things.

This is run time performance pure and simple and, unless you are going to start waving Pro-Language Flags around, no reasonable person can argue otherwise because the nature of the language removes the control of layout and placement.

You can argue good vs bad development all you like, in the context of this discussion it doesn't matter - it matters even less when the good vs bad is always a defensive and falls back to the tropes of "I've seen bad C++ code and good C# code which contradicts this so it must be wrong" because it is not wrong.

More to the point the continued refrain of "use the best tool for the job" isn't required either; neither the author nor people in this thread have argued otherwise so the constantly repeating of this line feels like a defensive 'must not upset anyone' whine more than anything else.

This thread isn't required.
The discussion here isn't productive.
Any honest user of a language would have looked at this for what it is - a comparison in a specific situation - nodded and got on with their lives.

Instead we have two pages of people trying to defend a language from points which were never made and conclusions not drawn to.. what? feel good about using it? Feel like 'all languages are equal'? Not upset some precious flower who might feel bad because their language of choice has a flaw which can't be bypassed without some excessive thinking?

Ugh...

Share this post


Link to post
Share on other sites

I still contend that if you're going to use a GC language, then you need to play nice with the GC

I'm curious if you have you tried doing this on a large scale?

I've spent an awful lot of time the last couple of years refactoring swathes of Java code (or porting it to C++), to reach the 'zero allocations' bar that one absolutely must hit if one wants a fluid and responsive android application. I'm not kidding about this - you will not be able to reliably hit 60fps if there are any allocations in your rendering or layout loops. And Java doesn't have struct types, so that means no short-lived compound types at all...
Sadly, this is 100% true. I just finsihed having to optimize both my Android rendering and layout loops for this very reason.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this