#### Archived

This topic is now archived and is closed to further replies.

# Why does JIT have to do everything? (offshoot - JIT speed)

This topic is 5246 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

(That thread was getting off-topic, and this doesn't really apply to the OP's original question.) I have came across the claim several times in this group that JIT will be able to have code that executes faster than Optimized and Compiled Native Code (OCNC). This post is to refute that claim using Microsoft's own documentation. In the thread I am breaking this off of, there is:
quote:
Original post by Etnu
quote:
As for the CLR team, we continue to work to provide a platform that is substantially more productive than native code and yet is faster than native code. Expect things to get better and better. Stay tuned.
That is their goal (as it should be), and they are working towards it. They have not achieved it - yet. I believe that they will, eventually, but not for at least several more years.
The article referred to (read here) states several reasons that OCNC will always be faster overall. The only point that they did make that I saw that would make a JITed program possibly faster is that:
quote:
The garbage collector even takes into account your machine's cache size to try to keep the gen 0 objects in the fast sweet spot of the cache/memory hierarchy.
If you can make your entire program greatly optimizable with this with JIT, you will probably not have a great number of objects in use by the program, so an OCNC program will probably still be _just as fast_, if not faster. The article referred to starts off with a claim that tells you a lot about the reason the article exists:
quote:
When it comes to the rich and convenient .NET Framework, it's like we're kids in the candy store. "Wow, I don't have to do all that tedious strncpy stuff, I can just '+' strings together! Wow, I can load a megabyte of XML in a couple of lines of code! Whoo-hoo!"
So these are professional programmers at MS that don't even use std::string as part of C++? They can't even find a good XML wrapper? The tone of that line alone makes me think that .NET is nothing more than a marketing gimmick. (And/or MS code is written in a very error-prone way.) Going through that article, we come across the following points that point to JITed code _always_ being slower than OCNC code:
quote:
If your application follows the recommendations in these other articles, the overall cost of garbage collection can be insignificant, a few percent of execution time, competitive with or superior to traditional C++ object new and delete. The amortized cost of creating and later automatically reclaiming an object is sufficiently low that you can create many tens of millions of small objects per second.
Notice the 'can be' and 'a few percent'. Obscure words, that hide the fact that GC has been shown to always be slower than properly written allocation/deallocation techniques (although GC can relieve the programmer of some headaches).
quote:
one of the sublime benefits of shipping your components and applications as assemblies of CIL is that your program can automatically get faster every second, and get faster every year—"faster every second" because the runtime can (in theory) retune the JIT compiled code as your program runs; and ... So if a few of these timings seem less than optimal in .NET 1.1, take heart that they should improve in subsequent releases of the product.
Here is the biggie. If the JIT compiler is profiling and 'retuning' your code as your code executes, the code can never be executing as fast as OCNC, because of the extra operations the JIT compiler is having to perform as your code executes. Your code is being interrupted, therefore executing slower.
quote:
Because the total heap size may be hundreds of MB, while the gen 0 heap size might only be 256 KB, limiting the extent of the GC's object graph tracing to the gen 0 heap is an optimization essential to achieving the CLR's very brief collection pause times.
Hmmmm. So as soon as your program creates more objects than the gen 0 heap size can handle, the GC times nose-dive. Hmmmmm.
quote:
However, it is possible to store a reference to a gen 0 object in an object reference field of a gen 1 or gen 2 object. Since we don't scan gen 1 or gen 2 objects during a gen 0 collection, if that is the only reference to the given gen 0 object, that object could be erroneously reclaimed by GC. We can't let that happen!
Danger, Will Robinson, Danger.
quote:
Instead, all stores to all object reference fields in the heap incur a write barrier. This is bookkeeping code that efficiently notes stores of new generation object references into fields of older generation objects. Such old object reference fields are added to the GC root set of subsequent GC(s). The per-object-reference-field-store write barrier overhead is comparable to the cost of a simple method call (Table 7). It is a new expense that is not present in native C/C++ code, but it is usually a small price to pay for super fast object allocation and GC, and the many productivity benefits of automatic memory management.
Oh. So their GC scheme will always cause the entire program to execute more slowly than OCNC code. Another biggie.
quote:
Write barriers can be costly in tight inner loops. But in years to come we can look forward to advanced compilation techniques that reduce the number of write barriers taken and total amortized cost. You might think write barriers are only necessary on stores to object reference fields of reference types. However, within a value type method, stores to its object reference fields (if any) are also protected by write barriers. This is necessary because the value type itself may sometimes be embedded within a reference type residing in the heap.
So you can be bitten by the cost of 'write barriers' in places where you may not expect it. Hmmmm.
quote:
Recalling previous sections, we can expect object array element stores to be considerably more expensive. To store an object reference into an array of object references, the runtime must: 1. check array index is in bounds; 2. check object is an instance of the array element type; 3. perform a write barrier (noting any intergenerational object reference from the array to the object).
More slowdown.
quote:
A partnership between .NET compilers and the CLR enables value types, including primitive types like int (System.Int32), to participate as if they were reference types—to be addressed as object references. This affordance—this syntactic sugar—allows value types to be passed to methods as objects, stored in collections as objects, etc. To "box" a value type is to create a reference type object that holds a copy of its value type. This is conceptually the same as creating a class with an unnamed instance field of the same type as the value type. To "unbox" a boxed value type is to copy the value, from the object, into a new instance of the value type. As Table 9 shows (in comparison with Table 4), the amortized time needed to box a int, and later to garbage collect it, is comparable to the time needed to instantiate a small class with one int field.
So primitive types like int are (somtimes or always) slower to use than C/C++ primitives.
quote:
Delegates are roughly similar to function pointers in C++; however, delegates are type-safe and secure. ... Comparing Table 10 and Table 3, note that delegate invoke is approximately eight times slower than a method call.
Aren't C++ function pointers 'type-safe?' How are delegates 'secure?' (You are the only one who can modify the code if it is wrong?) This sounds like a BS statement purely for marketing purposes. But note the speed hit you take for them. (But I am not sure that C++ member function pointers are not similarly slow to access.)
quote:
Due to space and time restrictions, we did not cover locking, exception handling, or the code access security system. Consider it an exercise for the reader.
More speed reducers compared to C/C++/assembly?
quote:
We have seen that jitted managed code can be as "pedal to the metal" as native code.
Yeah, right. What a load of cr*p.
quote:
As for the CLR team, we continue to work to provide a platform that is substantially more productive than native code and yet is faster than native code.
I won't argue with 'more productive.' But I believe we have seen, from their own article, that it will never be faster than properly written native code. David (e1 - remove lines around quotes) (e2 - remove lines around quotes & make point better) (e3 - fix dumb spelling error, and add in this note that the JIT code will always take the additional time to compile over the native code, as well as all of the above - duh) [edited by - david oneil on June 8, 2004 5:14:58 PM] [edited by - david oneil on June 8, 2004 5:21:03 PM] [edited by - david oneil on June 8, 2004 7:23:52 PM]

##### Share on other sites
Uhm, you''re looking at it all wrong.

JIT can be faster than natively compiled code because it can take advantage of the fact the ability to recompile at runtime based on system configuration (using ops that require more memory but less processing power, or less memory & more processing power, for example). It can be faster because a VM update will automatically improve the speed of existing applications.

MY original point was that the benefits gained from these improvements HAVE NOT YET CAUGHT UP WITH the overhead associated with the VM''s themselves, nor do I believe they will for at least another 3-5 years.

All of that is mostly irrelevant though, because the speed difference is only significant in performance-critical applications, of which there are not all that many. Most applications are best designed with .Net / Java / something similiar. There''s no point in writing a program in C++ to gain a 50% boost in speed when your application only uses a million clocks anyway.

Oh, and with regards to things like not using std::string, my guess would be that, since they''re trying to cater to developers of many different languages, it''s easier to use a function than to discuss the nuances of the STL.

##### Share on other sites
Oh, and if you''re worried about 100% optimal performance, you''re still not doing that with C/C++. Hope you understand ASM very well.

It''s a question of "how much speed is lost" vs. "how much productivity is gained".

I, personally, believe there''s currently too much of a tradeoff for it to be a good design scheme for performance-critical programs (databases, games, simulations, etc.), but it''s a fine tradeoff for the so-called "productivity" applications.

##### Share on other sites
Cliff notes: C# is slower than C++.

Yeah, that was quite the revelation, wasn't it? I'm sure it took everyone by surprise.

Edit:
Regarding its applicability to games, most hobbyist engines are not even going to touch the bounds of managed code performance, but everyone seems to follow the crowd for no apparent reason.

[edited by - antareus on June 8, 2004 6:34:41 PM]

##### Share on other sites
quote:
JIT can be faster than natively compiled code because it can take advantage of the fact the ability to recompile at runtime based on system configuration (using ops that require more memory but less processing power, or less memory & more processing power, for example).

From the MSDN article:
quote:
After twenty more years of Moore's Law, circa 2003, processors are fast (issuing up to three operations per cycle at 3 GHz), RAM is relatively very slow (~100 ns access times on 512 MB of DRAM), and disks are glacially slow and enormous (~10 ms access times on 100 GB disks)...
...
Now our fastest PCs can issue up to ~9000 operations per microsecond, but in that same microsecond, only load or store to DRAM ~10 cache lines. In computer architecture circles this is known as hitting the memory wall. Caches hide the memory latency, but only to a point. If code or data does not fit in cache, and/or exhibits poor locality of reference, our 9000 operation-per-microsecond supersonic jet degenerates to a 10 load-per-microsecond tricycle.

so swapping to ops that require more memory but less processing power is a bad example. And on modern machines I cannot think of a good example. (Maybe using SSE or other abilities that a certain computer has, but code specifically compiled for that system will still be at least as fast.)
quote:
...It can be faster because a VM update will automatically improve the speed of existing applications.

I would like to see MS put THAT in writing!
quote:
MY original point was that the benefits gained from these improvements HAVE NOT YET CAUGHT UP WITH the overhead associated with the VM's themselves, nor do I believe they will for at least another 3-5 years.

And my point is that they never will catch up with the overhead of the VM. Ever.
quote:
Oh, and with regards to things like not using std::string, my guess would be that, since they're trying to cater to developers of many different languages, it's easier to use a function than to discuss the nuances of the STL.

Well, the next paragraph of the article states "...In C or C++ it was so painful you'd think twice..." regarding the XML thingy, so it looked to me like they were kinda targeting the C++ crowd, and making grandiose claims in the process.
quote:
Yeah, that was quite the revelation, wasn't it? I'm sure it took everyone by surprise.

My purpose was simply to state it in black and white. This is not meant as a suprise, but rather a thread you can refer a programmer to in the future when they state ".NET is faster than XXX" I haven't seen a thread of that nature (but I haven't looked too hard, either - I was just sick and tired of seeing people repeat that managed code could beat compiled code). .NET may be easier to code in, make cross-platform development a breeze, and help you be more productive, (kinda like JAVA!) but faster executing? No. (And I seem to remember that MS put a certain clause in their .NET usage EULA stating that it was illegal for users to post benchmarks of .NET code. Hmmmm.)

David

(e1 - removed lines around quotes)

[edited by - david oneil on June 8, 2004 12:39:08 AM]

##### Share on other sites
I used to think that the MSIL/JIT thing wasn''t so bad. That was before I used it. I hand-coded some MSIL:

.assembly SpeedTest1 {}.method static void  Test1(){    .entrypoint    .locals init ( [0] int32 i )          ldstr "Starting..."    call void [mscorlib]System.Console::WriteLine(class System.String)    ldc.i4.0    stloc iLooper:    ldloc i    ldc.i4 1    sub    dup    stloc i    brtrue Looper    ldstr "Finished."    call void [mscorlib]System.Console::WriteLine(class System.String)    ret}[/CODE]Because I''m a MSIL noob I didnt bother figuring out clock functions under the .net framework (and I dont have a .net compiler to do this in an HLL) so I used a stopwatch. I assembled with the Microsoft ILASM compiler. My systems is an AMD XP 2000+ and this sucker takes 11 seconds from the "Starting..." to the "Finished" Its an empty loop that executes 4294967296 times.Contrast this with equivilent VB6 loop compiled with all optimisations turned on:[CODE]Public Sub TimeTest  Dim i as long  Do    i = i - 1  Loop While iEnd Sub[/CODE]This sucker takes only 5 seconds to execute. So I''m thinking maybe I wrote the MSIL poorly so I tried an entirely "stack-based" approach (removed the temporary I variable which I assume was just held in a register to begin with) and I got the same 11 seconds betweeen "Starting..." and "Finished"So now I''m thinking.. do I have the same JIT as everyone else? my JIT seems rather horrible! Twice as slow as VB6? Gimme a break.- Rockoon

##### Share on other sites
I''m not entirely sure that decrementing a counter in an otherwise empty loop is a useful performance test.

Try something that actually does useful work.

In my experience, things which are typically quite fast

- Integer / branching things
- Calls
- structure operations etc, provided you don''t do anything bad

Notably a bit slow under M\$ .NET at the moment

- Intense floating point work

Can be slow depending on how they''re implemented

- String handling, if the programmer doesn''t fully understand how strings work in C# at runtime

Mark

1. 1
Rutin
31
2. 2
3. 3
4. 4
5. 5

• 13
• 23
• 11
• 10
• 14
• ### Forum Statistics

• Total Topics
632962
• Total Posts
3009512
• ### Who's Online (See full list)

There are no registered users currently online

×