Sign in to follow this  
lunarss

[.net] SIMD Support for the .Net Platform

Recommended Posts

I'm trying to get MS to add native support for SIMD extensions to the .Net platform. This would be pretty nice for graphics programmers. I know the CLR can use SIMD, but it really sucks ass at it. The only other way to do it is to call into a c++ dll, but that is neither a clean or efficient implementation. Go here and vote for my suggestion please I'm surprised that nobody else has asked for something like this since .Net is such a nice platform to develop for.

Share this post


Link to post
Share on other sites
There are a couple of issues with SIMD (SSE in particular) that make it hard to standardize.

First of all it's only supported from Pentium 3 and Athlon XP. There are still quite a lot of people with a nice 1.4 GHz Athlon that can't run SSE. For game development it's probably acceptable to just deliver two executables, or even not run at all on non-SSE processors. For every other market where C# is used (about 99% I guess), it would be completely unacceptable.

Another problem is memory alignment. SSE performance drastically depends on 16-byte aligned data. Arithmetic operations with a memory operand will even cause an exception with unaligned data. But on most platforms the standard alignment of stack and heap data is 4 bytes. But even if they extended that with 16-byte alignment with little overhead, it's still quite likely that the memory layout is far from optimal.

It just takes an assembly programmer to write efficient SIMD code. No compiler is capable of producing code that looks close to optimal for a trained assembly programmer. SSE can be about two times faster but that's really a relatively small margin and it's easy to mess things up.

So, in my opinion it's best to just make it simpler to use assembly or C++ in separate modules, but not in C#.

Share this post


Link to post
Share on other sites
I meant SIMD as a paradigm rather than processor specific extensions.

It would be up the the compiler to decide whether to use processor extensions or not. The idea I have would only suggest to the compiler that SIMD extensions should be used if they are supported. If a user's processor does not support SIMD in any way, then the compiler could choose to use four standard float multiplies instead.

The .Net framework uses this type of design as it is. You could port the .Net framework (neglecting the MS specific pieces) to any processor architecture you wanted to. The compiler would just need to know how to crunch the MSIL down into instructions that the platform could understand (x86 etc).

I agree with the memory alignment issue you brought up. This is solveable at the compiler level though, and would be relatively simple with the GC model and the simple structure I suggested.

Also, the performance benefits of using a SIMD architecture increase with the size of your data set so science, graphics, and engineering users stand to benefit.

Share this post


Link to post
Share on other sites
Indeed it should stay platform independent, which can be fixed with a processor feature check, but this creates new issues. The check takes extra time, and the code size increases (practially doubles for floating-point code). Both can make the benefits of SIMD go up in smoke.

The simple truth is that C# was never intended as a truely best performance language. We have native languages for that, like C++. So unless C# becomes native and looses its platform independence there's really no point in supporting SIMD.

Until that day, if it ever comes, if you want to get the best performance on a specific platform and that last 10% matters to you, use assembly (either pure or inline or intrinsics, whatever works best for you). This is the only way you can ensure you get the optimized code you intended.

About the alignment issue, I don't think it's that simple. Memory can be allocated in one module, and processed with SIMD in another. This can certainly be detected but aligning everything can be a serious waste of memory. The only optimal solution is to let the programmer decide. But then you can just as well assembly...

See what I'm getting at? If you want to use the most advanced processor features you're best off going down to that level, or it's just not worth the trouble. And for most 'science, graphics and engineering' uses there exist excellent libraries you can call from C#.

Share this post


Link to post
Share on other sites
Whatthe... I don't get it. Two executables, huge codebase... WTF?

This is C#! It's compiled to MSIL! All that's needed is a change to the runtime and every C# exe and dll ever compiled will be able to take advantage.

I guess the reason it is under-utilized at the moment is that the MS compiler folks have had more important and beneficial optimizations to do (as was said: "SSE can be about two times faster but that's really a relatively small margin"). Not much one can do, though. Writing to MS helps, of course. They do actually want feedback.

However, anything like using separate types like in the original poster's suggestion is not good. That does indeed cause all sorts of problems and compatibility issues. But the basic idea of this is perfectly doable in the runtime.

Share this post


Link to post
Share on other sites
Quote:
Original post by C0D1F1ED
The check takes extra time, and the code size increases (practially doubles for floating-point code). Both can make the benefits of SIMD go up in smoke.

Firstly, the check for SIMD support only needs to happen once when the program is first loaded, surely you don't think this is a significant delay? I'm not overly familiar with SIMD instructions, but why do you think code size would 'practically double'?

Quote:
Original post by C0D1F1ED
The simple truth is that C# was never intended as a truely best performance language. We have native languages for that, like C++. So unless C# becomes native and looses its platform independence there's really no point in supporting SIMD.


No, C# isn't a 'to the metal' language, but to suggest that that makes it incapable of taking advantage of SIMD instructions is ridiculous. C# has the potential to see as much benefit from SIMD as any vectorizing C++ compiler can. Infact C# can take BETTER advantage of it because of the JIT compilation of the MSIL to machine-dependant code.

Quote:
Original post by C0D1F1ED
About the alignment issue, I don't think it's that simple. Memory can be allocated in one module, and processed with SIMD in another. This can certainly be detected but aligning everything can be a serious waste of memory. The only optimal solution is to let the programmer decide. But then you can just as well assembly...


The CLR's garbage collector already handle's memory alignment for performance on the host machine, to adjust the alignment to 16-bytes for SIMD types is very much a non-issue.

Quote:
Original post by C0D1F1ED
See what I'm getting at? If you want to use the most advanced processor features you're best off going down to that level, or it's just not worth the trouble.


I couldn't disagree more. We're already at the point now with compiler technology that 9 times out of 10 the compiler will generate code significantly faster than the hand-crafted assembly, and for most of the remainder the compiler will match it for speed. To say that '<language X> is high-level so it couldn't possibly take advantage of <CPU feature X>, so why bother' is the sort of thinking that will take compiler technology....nowhere.

Share this post


Link to post
Share on other sites
Quote:
Original post by joanusdmentiaFirstly, the check for SIMD support only needs to happen once when the program is first loaded, surely you don't think this is a significant delay? I'm not overly familiar with SIMD instructions, but why do you think code size would 'practically double'?

Okay that's a bit exxagerated, but there will be overhead. And it sounds pretty strange to me that you want to benefit from SIMD and allow an overhead that could nullify it. Sure, you might still gain something in many cases, but really if you want the best performance then why choose C# in the first place? Personally I write ally my SSE code in C++ inline assembly and no compiler, native or managed, could beat it.
Quote:
No, C# isn't a 'to the metal' language, but to suggest that that makes it incapable of taking advantage of SIMD instructions is ridiculous. C# has the potential to see as much benefit from SIMD as any vectorizing C++ compiler can. Infact C# can take BETTER advantage of it because of the JIT compilation of the MSIL to machine-dependant code.

Vectorizing C++ compilers are still a joke. Unless you're really processing everything in packets of four, they are incapable of producing good SSE code with shuffle operations. They also can't reorder data structures to avoid unneccessary shuffling. I store some of my vectors/colors in wxyz/argb format which makes it significantly faster to operate on just w/a. Compilers can't make that kind of decisions.

I do understand that for C# it's a bit different. In the first place because it's an actively growing language. And purely technically it is a good idea to add a float4 type which translates to SIMD operations, because it's a strong hint of the programmer's intentions and this will certainly help the vectorization back-end. No doubt about that. Whether or not there is a real gain is a different case.

But the real question is, do we really want to clutter C# with this? To put this into perspective; a couple of years ago I thought Java was one of the nicest languages and well suited for beginners. Now it's a mess, and 'beginners' have to master practially everything to write a basic application. I love C# and I'd hate to see it go down the same road. I don't care about that extra bit of performance, it's a managed language for god's sake. When I do care about performance I instantly choose C++ with inline assembly.

There are good reasons why assembly, C, C++ and so many other languages still exist and will certainly survive C#. Each language has its qualities. Sun seems to have tried very hard to turn Java into the ultimate one-and-only programming language, but failed so badly. C#'s quality is that it's simple, clean, and doesn't perform badly. If you try to bring it to C++ performance you loose a lot of simplicity and cleanliness. And in my opinion that's absolutely not worth it.

Sure, SIMD is only a small step, but once people start using C# for high performance applications there will be more adjustments that step by step move C# into the Java junkyard. So please completely forget about changing the language for performance purposes. If the current vectorizer is no good then that problem should be solved in the vectorizer itself.

You just have to live with the fact that C# doesn't make any guarantees about performance. Even if better SIMD performance is added you still don't have any guarantee. The only language that offers that is assembly (and it's still tricky), and to some extent C++ because it's native.
Quote:
The CLR's garbage collector already handle's memory alignment for performance on the host machine, to adjust the alignment to 16-bytes for SIMD types is very much a non-issue.

Sure, but then you'll have to live with extra memory overhead in situations where that's best avoided. And cache misses can easily be worse than the performance gain from SIMD. No compiler can do an optimal job there and like I said before it's a small margin that is easily lost.
Quote:
I couldn't disagree more. We're already at the point now with compiler technology that 9 times out of 10 the compiler will generate code significantly faster than the hand-crafted assembly, and for most of the remainder the compiler will match it for speed. To say that 'language X' is high-level so it couldn't possibly take advantage of 'CPU feature X', so why bother' is the sort of thinking that will take compiler technology....nowhere.

9 out of 10 times for non-SIMD code. When it comes to writing SSE code an average assembly programmer can easily beat any vectorizing compiler.

And why would it be so bad if compiler technology didn't go much futher? There is absolutely nothing wrong with hand-crafted assembly code if top performance has the highest priority. In the end a compiler is nothing more than a tool that assists in writing assembly code. The tool can be damn powerful so you don't have to see the assembly code any more, but if you do care about what that assembly code looks like there is no better way than to go down to that level instead of changing the tool and loosing other qualities.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this