Sign in to follow this  
krum

[.net] Very sad IL optimization. Am I missing something?

Recommended Posts

I've been contemplating porting my game engine over to C# and the .NET platform from C++, and one of those things that has concerned me is the gratuitus use of pass-by-value. In the case of large value types, like a 4x4 matrix, a pass-by-value pattern can really slow things down if you do it a lot. One of the structs I looked at in particular was the Microsoft.DirectX.Matrix struct. Apparently there are no methods that support pass-by-reference and in fact as far as I can tell the language simply has no support to pass-by-reference with operators. The one thing that disturbed me the most is how the JIT compiles the simple Matrix.Multipy into machine code. As you can see from the example below which I took from the DBGCLR tool on a Release compiled source file, the JIT spams a bunch of movq instructions to copy the value of the Matrix value type. In a game engine you might not really do so many matrix multiplies that it makes a huge difference, but it seems to me that if we're going to move to .NET we still need to be able to squeeze as much as we can out of the runtime to hit the framerate targets that we have under load. My question is, am I missing something? Is there a better way? Write my own math library with methods that have pass-by-reference? After all the time I've spent tweaking to get the most out of it sometimes even hand writing SIMD code in assembler, all those movq instructions are bugging the heck out of me.
            Matrix m = Matrix.Identity;
00000035  lea         ecx,[ebp+FFFFFF34h] 
0000003b  call        dword ptr ds:[00ED5914h] 
00000041  lea         edi,[ebp-4Ch] 
00000044  lea         esi,[ebp+FFFFFF34h] 
0000004a  mov         ecx,10h 
0000004f  rep movs    dword ptr es:[edi],dword ptr [esi] 
            Matrix n = Matrix.Identity;
00000051  lea         ecx,[ebp+FFFFFEF4h] 
00000057  call        dword ptr ds:[00ED5914h] 
0000005d  lea         edi,[ebp+FFFFFF74h] 
00000063  lea         esi,[ebp+FFFFFEF4h] 
00000069  mov         ecx,10h 
0000006e  rep movs    dword ptr es:[edi],dword ptr [esi] 
            n = Matrix.Multiply( n,m);
00000070  lea         eax,[ebp+FFFFFF74h] 
00000076  sub         esp,40h 
00000079  movq        xmm0,mmword ptr [eax] 
0000007d  movq        mmword ptr [esp],xmm0 
00000082  movq        xmm0,mmword ptr [eax+8] 
00000087  movq        mmword ptr [esp+8],xmm0 
0000008d  movq        xmm0,mmword ptr [eax+10h] 
00000092  movq        mmword ptr [esp+10h],xmm0 
00000098  movq        xmm0,mmword ptr [eax+18h] 
0000009d  movq        mmword ptr [esp+18h],xmm0 
000000a3  movq        xmm0,mmword ptr [eax+20h] 
000000a8  movq        mmword ptr [esp+20h],xmm0 
000000ae  movq        xmm0,mmword ptr [eax+28h] 
000000b3  movq        mmword ptr [esp+28h],xmm0 
000000b9  movq        xmm0,mmword ptr [eax+30h] 
000000be  movq        mmword ptr [esp+30h],xmm0 
000000c4  movq        xmm0,mmword ptr [eax+38h] 
000000c9  movq        mmword ptr [esp+38h],xmm0 
000000cf  lea         eax,[ebp-4Ch] 
000000d2  sub         esp,40h 
000000d5  movq        xmm0,mmword ptr [eax] 
000000d9  movq        mmword ptr [esp],xmm0 
000000de  movq        xmm0,mmword ptr [eax+8] 
000000e3  movq        mmword ptr [esp+8],xmm0 
000000e9  movq        xmm0,mmword ptr [eax+10h] 
000000ee  movq        mmword ptr [esp+10h],xmm0 
000000f4  movq        xmm0,mmword ptr [eax+18h] 
000000f9  movq        mmword ptr [esp+18h],xmm0 
000000ff  movq        xmm0,mmword ptr [eax+20h] 
00000104  movq        mmword ptr [esp+20h],xmm0 
0000010a  movq        xmm0,mmword ptr [eax+28h] 
0000010f  movq        mmword ptr [esp+28h],xmm0 
00000115  movq        xmm0,mmword ptr [eax+30h] 
0000011a  movq        mmword ptr [esp+30h],xmm0 
00000120  movq        xmm0,mmword ptr [eax+38h] 
00000125  movq        mmword ptr [esp+38h],xmm0 
0000012b  lea         ecx,[ebp+FFFFFEB4h] 
00000131  call        dword ptr ds:[00ED590Ch] 
00000137  lea         edi,[ebp+FFFFFF74h] 
0000013d  lea         esi,[ebp+FFFFFEB4h] 
00000143  mov         ecx,10h 
00000148  rep movs    dword ptr es:[edi],dword ptr [esi] 

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by nsto119
You are passing by reference. Or rather, you're passing the reference by value. Classes in C# are reference types.


But Microsoft.DirectX.Matrix isn't a class, its a struct.

Share this post


Link to post
Share on other sites
Quote:
Original post by Guru2012
You can force it to pass by reference using the "ref" keyword, I believe.
I know you can force it. I don't remember whether it was "ref" or "out" though.


It seems that this only works if the method definition uses this keyword.

Share this post


Link to post
Share on other sites
Quote:
Original post by Guru2012
Then put it in the method definition...


But I can't change the method definition for types in the Microsoft.DirectX namespace.

Share this post


Link to post
Share on other sites
Clearly. The method needs to know what calling convention to use.

You can also define the method to take an object, which will box the value type (Matrix being the value).

However -- are you sure that this copying is actually a performance problem? I mean, how much of your program's time is spent multiplying matrices, anyway? Should be less than a percent, unless you do a hundred instances of a hundred-bone animation...

Share this post


Link to post
Share on other sites
That code isn't remotely optimized. Look at the sub esp, 0x40 instructions. Optimized form of that would put a single sub esp, 0x80 instead of one before each matrix.

I really doubt that Microsoft would ship .Net without knowing how to optimize something as simple as that.

It's possible that it's not showing you the final optimized native assembly instructions...

I'm especially suspicious because it's telling you that the function address starts down at 00000000. Nothing usually sits down that far.

[Edited by - Nypyren on June 6, 2006 11:37:05 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603
are you sure that this copying is actually a performance problem? I mean, how much of your program's time is spent multiplying matrices, anyway? Should be less than a percent, unless you do a hundred instances of a hundred-bone animation...


This may be true... can't really tell until I've got some rudimentary code ported over and have started running a scene. Really, I find it interesting... and frankly as a C++ programmer it's disturbing to me... that the API passes them by value. :D



Share this post


Link to post
Share on other sites
Indeed, the JIT probably isn't optimizing at all. When running a debug session (even when it's a release build) the JIT will turn off (most, if not all) optimisations so that the code is A) readable and B) can be related back to the original source in a meaningful way. I have heard of there being away to force optimizations, but I can't for the life of me remember what it is.

Share this post


Link to post
Share on other sites
Have you tried ngen.exe yet?
MSIL code can be compiled in two ways:

- On the fly (JIT) when the application is executed, meaning the compiler needs to be quick for the program to appear on the screen without delay.

- Manually using the "native image generator" which you can find as C:\Windows\Microsoft.NET\Framework\v2.0.50272\ngen.exe. This compiler will compile the MSIL code to machine code and it *should* perform all the optimization you're expecting from a good C++ compiler.

-Markus-

Share this post


Link to post
Share on other sites
From http://msdn2.microsoft.com/en-us/library/ms241594.aspx:

When you debug a managed application, Visual Studio suppresses optimization of just-in-time (JIT) code by default. Suppressing JIT optimization means you are debugging non-optimized code. The code runs a bit slower because it is not optimized, but your debugging experience is much more thorough. Debugging optimized code is harder and recommended only if you encounter a bug that occurs in optimized code but cannot be reproduced in the non-optimized version.

JIT optimization is controlled in Visual Studio by the Suppress JIT optimization on module load option. You can find this option on the General page under the Debugging node in the Options dialog box.



This guy has some good stuff about the JIT: http://blogs.msdn.com/davidnotario/default.aspx

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this