SSE in .NET
I know that C# doesn't nativly support SSE SIMD instructions. C++ however does. What if a person was to implement a wrapper to do SSE instructions in managed c++. You'd be able to use SSE instructions. 2 options:
Wrap SSE instructions:
Create an object to represents a SSE datatype. This object implements default SSE instructions like addps, etc.
I figured that the problem with this is that the compiler doesn't know how to optimize the SSE intructions, if we would chain them together in C#, or any other .NET language for that matter. (or does it?) So we need to find a way to optimize these instructions in an earlier stage.
Create specialized objects:
Another way is to implement specialized data types, in managed c++, that internally handles SSE datatypes and operate on them. Examples are matrices, vectors, packages of vectors, you name it. This allows the c++ compiler to optimize the SSE code for us, right?
I could find very little information on the matter on the web. What do you think about the subject?
Regards,
Bas
There are two main problems with this:
1) By invoking unmanaged code, you have removed from C# all processor and platform independence it may have had.
2) Crossing the managed/unmanaged boundary is very costly in terms of performance. Doing it for single SSE instructions would probably end up being slower. If you are going to do it, you need to do enough work on the other end to make it worth your while.
With all of that said, here's a link you might find interesting: NetAsm
1) By invoking unmanaged code, you have removed from C# all processor and platform independence it may have had.
2) Crossing the managed/unmanaged boundary is very costly in terms of performance. Doing it for single SSE instructions would probably end up being slower. If you are going to do it, you need to do enough work on the other end to make it worth your while.
With all of that said, here's a link you might find interesting: NetAsm
For C# to properly support SSE the Just In Time compiler would need to be modified to support the extensions. The JIT team has indicated that SSE is used for some manipulations (such as casting between a double and int) but not for general code gen. (This was some time ago; I'd be interested to see if .NET 3.0 or above supported it.)
Any attempt to use an object to support these instructions would be inefficient at best.
Any attempt to use an object to support these instructions would be inefficient at best.
Quote:Original post by Moe
What exactly are you trying to do that requires so much optimization?
I also wonder if a managed language is the right tool to use if you need such access to hw features...
My goal is to write a fairly complicated raytracer. C# will do just as good as C++ although C++ has the advantage of native SSE support which speeds matrix and vector math up.
I was just looking into methodes to speed up the math routines. NetAsm (thanks Mike.Popoloski) really does the thrick though. Math routines(using SSE) can definitly outperform the default, JIT compiled, implementation. It is also possible to forget all about the injection if the machine might not support sse. This way jou still have a platform independent solution.
I'm not trying to write a real-time raytracer of some kind. ;]
Bas
I was just looking into methodes to speed up the math routines. NetAsm (thanks Mike.Popoloski) really does the thrick though. Math routines(using SSE) can definitly outperform the default, JIT compiled, implementation. It is also possible to forget all about the injection if the machine might not support sse. This way jou still have a platform independent solution.
I'm not trying to write a real-time raytracer of some kind. ;]
Bas
C++ has no support for SSE; some C++ compilers have support for SSE intrinsics. Important clarification.
Also be aware that SSE has a very long initialization (up to 150 clock ticks on the P4 and Core1/2 Uno/Duo/Quad variants, IIRC?), so you have to do a fairly large batch of SSE instructions (40+) all in a row in order to make it worth it.
Also be aware that SSE has a very long initialization (up to 150 clock ticks on the P4 and Core1/2 Uno/Duo/Quad variants, IIRC?), so you have to do a fairly large batch of SSE instructions (40+) all in a row in order to make it worth it.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement