With exceptions, most of the posts in this thread are entirely missing the point; mostly derrailing into GC problems, language wars or compiler optimizations.
That is not what the article is about.
First, compilers work at the instruction level. Like Mike Acton showed at cppcon2014; there can easly be as much as a 10:1 ratio or more between memory access patterns and code execution.
Compiler optimizers work on the "1" from the ratio, while completely unable to do anything about the optimizations in the "10" space, where one could easily gain 5x to 10x optimizations by taking advantage of it.
This is a problem shared by ASM, C, C++, C#, Java, Lua and many other programming languages. The difference is, asm, C & C++ allow you to easily do something about the problem with minimal effort.
On C# & Java there must be a significant effort or you fight the language by ignoring recommended or intended programming patterns.
Furthermore, recommended/intended programming patterns encourage cache trashing, inability to hide latency, and bandwidth saturation. One simple example is the lack of the const modifier.
C/C++ allows to return const pointers to get read-only access to a memory address. Sure, you can const cast this pointer and break the assumption, but const casting is something that comes with a "DO IT AT YOUR OWN RISK" label. You're breaking a promise.
Java & C# do not have such concept, and encourage instead to return a cloned copy. The advantage is that no one can break the promise because modifications to that clone will not affect the original memory region. The disadvantage is that the memory copy blows the cache & contributes to bandwidth saturation. Many language designers assume the general case is infinite RAM (even C!) so they don't care about cloning memory regions (memory exhausting is rarely a problem on modern systems with >4GB machines); however they ignore the fact that bandwidth & cache is very expensive. Even though memory management has always been a problem (as pointed out by another poster), the problem had historically been around the fact that memory was limited and in PC you would hit the HDD when you ran out. However the current problem is an entire different beast. Hitting the HDD is very rare, but memory bandwidth is a relatively new problem (because it was rarely a bottleneck), memory latency is often the visible cause of performance slowdowns.
I remember reading the explanation on why C# couldn't use const memory regions, and the explanation was pretty compelling. However it doesn't change the fact that the feature isn't there and this causes performance degradation.
Even though they may not even use 1GB of RAM, they may be saturating the BW or trashing the cache. And suddenly performance goes to hell.
Another problem of storing in the heap is that a mutex needs to be performed every time. Even though there are mitigation solutions, this is a problem which C# does compulsively.
Resorting to C/C++ for "compute bound" applications while doing everything else in C# is also missing the point. Again, this is not a problem of being compute bound. Because an extremely good optimizing compiler can produce efficient code out of C# & Java that can result in good compute performance (that may even surpass a not so good C++ compiler!). The problem is about memory.
A code that generally executes in C# results in what profiling veterans call "the cluster bomb". If you want to find out why the program is running slow, launching it in a profiler discovers that the program doesn't spends a high percentage of its time in a particular routine, but rather the inefficiencies are distributed across the entire codebase, adding up wasted time incrementally via branch missprediction, hidden memory allocations, unnecessary memcpys, virtual functions, having the CPU waiting for data to arrive etc. Worst case scenario ever and a PITA to solve.
C & C++ isn't perfect either. Most virtual functions in a C++ project can be resolved at compile time, though the compiler can't do anything about it because of the guarantees the language demands (what if an external application imports or hooks from/into the EXE/DLL and overloads the class? bam! you can't take away the vtable) and the lack of language tools to help the compiler take away the virtual (there is no way to tell easily the compiler "clone this vector<Base> N times, one for each derived class, and when we add each pointer, we add it to the specific vector, so when we iterate we iterate everything in order, without resolving any virtual table at all").
Also DOD (Data Oriented Design) feels like fighting OOP (Object Oriented Programming), but it shouldn't be like this. Because OOP is about the relationship between constructs called "objects", while DOD is how these objects lay out the data in memory and how the code operates in it.
However most languages (except for HLSL & GLSL, perhaps may be ISPC?) I know of intrinsically tie the code with memory layout and code execution & execution flow, which makes DOD & OOP almost antagonic.
I'm still waiting for a language that takes my beautiful OOP structured code and spits out code that is DOD friendly. I'm convinced this is possible.
However, the thing with most high level languages is that writing anything else than the desired method is hard of difficult. This is a good thing thing. It makes coding easier, homogenizes code across developers, induces to less bugs, and even a monkey could use it. The problem is that currently these "desired methods" do not take cache, BW or even lock contention in the heap into account.
The difference between a veteran & rookie developer in high level languages often boils down to big O notation (the chosen algorithm). But the problem is that once you take away the algorithmic differences, a veteran has a hard time optimizing the code further, because the language won't let them or make it painfully difficult.