[.net] byte [] to an IntPtr

Started by
23 comments, last by Fiddler 15 years, 7 months ago
Quote:Original post by gharen2
I can confirm that math code that's optimized with unsafe code is much faster.

Really? That's funny because my results pulled straight from the JIT tend to disprove that.

Unsafe code, especially in regards to the fixed keyword, does add a relatively large amount of overhead to managed code.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Advertisement
Quote:Original post by krum
Quote:Original post by capn_midnight
The performance hit from using unsafe code is HUGE.


I'm pretty sure that this isn't correct. I've spent a substantial amount of time researching this subject, and I've found that there is no performance penalty for unsafe code per se in C#.

There is a very slight penalty to using the 'fixed' keyword which is often associated with unsafe because the runtime pins the array, but even that usually isn't bad unless used in an inner loop (normally you'd use 'fixed' outside of a loop).

There can be a huge hit to unsafe code. As I have demonstrated in my journal.
Quote:
Just to prove my point, take the following code:

*** Source Snippet Removed ***

Now, lets take a look the machine code that test1() generates

*** Source Snippet Removed ***

and test2()

*** Source Snippet Removed ***

Clearly, using the unsafe pointer is faster, because there is no null reference check - which is obvious as the first three instructions of test2().

There is a penalty to using the 'fixed' keyword, which includes a null reference check and the pointer is pinned, but clearly if you were going to iterate over all of the elements in the array using the unsafe pointer would be substantially faster.

...
Wrong
	class Program {		static void Main(string[] args) {			int[] numbers = new int[1024];			Random r = new Random();			for(int i = 0; i < numbers.Length; ++i) {				numbers = r.Next();			}			AddOneSafe(numbers);			unsafe { fixed(int* p = numbers) {				AddOneUnsafe(p, numbers.Length);			}}			Console.ReadKey();		}		static void AddOneSafe(int[] array) {			for (int i = 0; i < array.Length; ++i) {				++array;			}		}		static unsafe void AddOneUnsafe(int* array, int length) {			for (int i = 0; i < length; ++i) {				++array;			}		}	}

produces the following two pieces of assembly:
!u 00152ff0Normal JIT generated codeConsoleApplication1.Program.AddOneSafe(Int32[])Begin 00220128, size 1a00220128 56               push        esi00220129 33D2             xor         edx,edx0022012B 8B7104           mov         esi,dword ptr [ecx+4]0022012E 85F6             test        esi,esi00220130 7E0E             jle         0022014000220132 8D449108         lea         eax,[ecx+edx*4+8]00220136 830001           add         dword ptr [eax],100220139 83C201           add         edx,10022013C 3BF2             cmp         esi,edx0022013E 7FF2             jg          0022013200220140 5E               pop         esi00220141 C3               ret!u 00152ff8Normal JIT generated codeConsoleApplication1.Program.AddOneUnsafe(Int32*, Int32)Begin 00220158, size 1c00220158 57               push        edi00220159 56               push        esi0022015A 8BF9             mov         edi,ecx0022015C 8BF2             mov         esi,edx0022015E 33C9             xor         ecx,ecx00220160 85F6             test        esi,esi00220162 7E0D             jle         0022017100220164 8D148F           lea         edx,[edi+ecx*4]00220167 830201           add         dword ptr [edx],10022016A 83C101           add         ecx,10022016D 3BCE             cmp         ecx,esi0022016F 7CF3             jl          0022016400220171 5E               pop         esi00220172 5F               pop         edi00220173 C3               ret

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Quote:Original post by Washu

Wrong
*** Source Snippet Removed ***
produces the following two pieces of assembly:
*** Source Snippet Removed ***


With all due respect, Washu, this has nothing to do with unsafe code - rather this is a good demonstration of how poor the JIT's optimizer is. Either way, I think we'll find that the safe code may take at least one additional clock cycle per loop due to the additional add

lea         eax,[ecx+edx*4+8]


vs

lea         edx,[edi+ecx*4]


I believe the fact that the unsafe code has a few more instructions to set the function up is largely insignificant. Even if the CPU can do both LEA instructions in the same number of clock cycles, I think to suggest that in this case unsafe code produces a substantial performance penalty is going overboard.

The more extreme case would be if you had an array of 4x4 transformation matrices, say from an animation, that you needed to multiply with a parent transform. Using safe code could be tremendously slower because it would need to copy each matrix at least twice for each operation, whereas using pointers could reduce the memory bandwidth requirements.

My point is that using unsafe code per se is not slower. What you do with it is - like anything - a whole different story.





Quote:Original post by krum

With all due respect, Washu,


So, I thought I'd test my theory and added some timing code to your example:

using System;using System.Collections.Generic;using System.Text;using System.Runtime.InteropServices;using System.Security;namespace washu{    class Program    {        [DllImport("kernel32.dll"), SuppressUnmanagedCodeSecurity]        extern static void QueryPerformanceCounter(out long clock);        static void Main(string[] args)        {            int[] numbers = new int[1000000];            Random r = new Random();            for (int i = 0; i < numbers.Length; ++i)            {                numbers = r.Next();            }            long t0, t1, t2;            for (int i = 0; i < 100; ++i)            {                QueryPerformanceCounter(out t0);                AddOneSafe(numbers);                QueryPerformanceCounter(out t1);                unsafe                {                    fixed (int* p = numbers)                    {                        AddOneUnsafe(p, numbers.Length);                    }                }                QueryPerformanceCounter(out t2);                long time1 = t1 - t0;                Console.WriteLine(String.Format("safe={0} unsafe={1}", (t1-t0)/numbers.Length, (t2-t1)/numbers.Length));            }            Console.ReadKey();        }        static void AddOneSafe(int[] array)        {            for (int i = 0; i < array.Length; ++i)            {                ++array;            }        }        static unsafe void AddOneUnsafe(int* array, int length)        {            for (int i = 0; i < length; ++i)            {                ++array;            }        }    }}


What we find is that the safe code does in fact, on my computer, take on average one additional clock cycle per element. I think you'll find my testing methodology to be accurate enough. But again, this isn't about how good the JIT optimizes, but the performance penalty of the unsafe code block. Unsafe by itself does not generate additional machine code. Using pointers might, and 'fixed' is an issue - I do not dispute that.

On my computer QueryPerformanceCounter() returns CPU clocks. YYMV.

safe=15 unsafe=13safe=15 unsafe=12safe=15 unsafe=14safe=14 unsafe=13safe=14 unsafe=13safe=14 unsafe=13safe=14 unsafe=12safe=14 unsafe=17safe=14 unsafe=13safe=14 unsafe=13safe=14 unsafe=13safe=14 unsafe=12safe=13 unsafe=12safe=13 unsafe=12safe=14 unsafe=13safe=14 unsafe=12safe=15 unsafe=14safe=14 unsafe=13safe=15 unsafe=14safe=14 unsafe=12safe=13 unsafe=13safe=14 unsafe=12safe=14 unsafe=13safe=14 unsafe=13safe=15 unsafe=14safe=14 unsafe=13safe=14 unsafe=12safe=15 unsafe=12safe=15 unsafe=14safe=14 unsafe=14safe=14 unsafe=12safe=15 unsafe=13safe=15 unsafe=13safe=14 unsafe=12safe=14 unsafe=13safe=14 unsafe=13safe=14 unsafe=13safe=13 unsafe=12safe=14 unsafe=12safe=14 unsafe=14safe=14 unsafe=13safe=14 unsafe=13safe=14 unsafe=13safe=15 unsafe=13safe=14 unsafe=13safe=14 unsafe=13safe=13 unsafe=12safe=13 unsafe=12safe=14 unsafe=13safe=14 unsafe=12safe=14 unsafe=13safe=14 unsafe=12safe=14 unsafe=13
wouldn't something totally cryptic like:

static unsafe void AddOneUnsafe(int* array, int length) {   int *a = (int*)array;   for (int i = 0; i < array.Length; ++i)       ++*(a++);}


get better performance if you were using pointers? Wouldn't that get rid of the multiplicaton? Given a large enough dataset, the performance should offset the fixed penality?

Then again, I'm just a n00b, I have no idea what I'm talking about. I am honestly curious.
Quote:Original post by RipTorn
Quote:Original post by Fiddler
For large arrays you should not copy elements around. In this case, if you do not want to resort to unsafe code, use the following snippet:
            GCHandle h0 = GCHandle.Alloc(array, GCHandleType.Pinned);            try            {                // h0.AddrOfPinnedObject() will get you your IntPtr            }            finally            {                h0.Free();            }


Awesome. Where/how did you find/figure this out? [smile]

This is by far the best way, and safe provided you don't buffer overrun I guess


I think I first saw that snippet in the msdn documentation for manual marshalling, but I can't remember for sure. Really, I must have read all available articles on P/Invoke, marshalling, garbage collection, unsafe code and reflection about 3 times each, when I was writing the new Tao.OpenGL bindings :)

I haven't done any real world tests on this code, but it should be on par with 'fixed' regarding performance; what I'm worried about is that the try-finally block may be adding a lot of overhead, but I can't see a way to safely remove it. Moreover, according to msdn, pinning should not be used very frequently or for small objects, since this reduces GC performance - in this case, it might be better to just copy the data. On the other hand, this tecnhique is better for large objects, especially since there is a chance that the GC will not move them even if not pinned.

[OpenTK: C# OpenGL 4.4, OpenGL ES 3.0 and OpenAL 1.1. Now with Linux/KMS support!]

Quote:Original post by krum
With all due respect, Washu, this has nothing to do with unsafe code - rather this is a good demonstration of how poor the JIT's optimizer is. Either way, I think we'll find that the safe code may take at least one additional clock cycle per loop due to the additional add

This JIT optimizer has an extremely short amount of time to run. Unlike C++ compilation where you can run for hours without people caring, .NET jitting has to happen within microseconds. Frankly there just aren't a hell of a lot of optimizations you can do in that short of a span of time. I should note that x64 code produced is much better than x86 code though (simply because of certain instruction set guarantees).
Quote:
lea         eax,[ecx+edx*4+8]
vs
lea         edx,[edi+ecx*4]


I believe the fact that the unsafe code has a few more instructions to set the function up is largely insignificant. Even if the CPU can do both LEA instructions in the same number of clock cycles, I think to suggest that in this case unsafe code produces a substantial performance penalty is going overboard.

The setup code is expensive, however a loop like that does not introduce a significant hit for either one. However, it did kill your claim that unsafe code in a loop would be significantly faster than safe code. The reality is that an LEA of [r + r * c + c] is about 1 cycle to 1.5 cycles slower than [r + r * c]. Pipelining and cache misses will more than make up for that amount of time.
Quote:The more extreme case would be if you had an array of 4x4 transformation matrices, say from an animation, that you needed to multiply with a parent transform. Using safe code could be tremendously slower because it would need to copy each matrix at least twice for each operation, whereas using pointers could reduce the memory bandwidth requirements.

My point is that using unsafe code per se is not slower. What you do with it is - like anything - a whole different story.

Eh, careful there. There are plenty of managed operations you can do that will eliminate the copies. ref parameters being one of the big ones. Dropping straight down to unsafe code just because it "appears" faster is a misnomer. Furthermore, the setup code you so diligently tossed away adds an invisible overhead that will hit you when you least expect it. First and foremost, any time a collection runs with pinned objects, those objects don't get moved, this fragments the heap. That fragmentation slows down allocations, and also slows down further collections when they happen (it ends up having to do more work). Now, the chances of a collection happening during short pinned durations is fairly small. The cost of the lock and unlocking of the critical section in order to pin the object is not so small. Furthermore, matrix multiplication typically requires per element access, something that I've shown in my journal with the simplest of operations (that of calculating the magnitude of a vector) is NOT the best generated code in the world for pinned pointers. The fact is, the JIT spends very little time optimizing unsafe code, it spends more time optimizing managed code. There is are areas where "unsafe" code 'might' be faster, for instance ones that deals with memory copies. Although you would be hard pressed to beat the built in memory copy (as it uses what was, in 2005, considered the most optimal form of a memory copy for the P4/AMD chipsets). Even MDX doesn't do that, instead delegating its copy operations to a pinvoke of memcpy.

Frankly, unsafe code is unsafe for many reasons, not just because it can't be verified, but because you are taking great risks in attempting to outguess the JIT and "optimize" your code. If you want to optimize your code, then using in-assembly pre-compiled machine code would be recommended. This way you can hand optimize it using the latest instruction sets, vectorization, and other operations.

Now, there are certainly cases where unsafe code can beat managed code. More often than not though, it falls under the 80/20 rule and such optimizations aren't optimizations, just wastes of time. You're better of investing in better algorithms first.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Another one for Washu I guess, as per my earlier post on the method I use, mainly because I have to expose a byte* to an unmanaged dll (the only time I use pointers, and maybe that's what the OP wanted it for), for the unsafe bandwagoners :D, the question being I currently do it like so.

fixed (byte *_bp = byteArray)
{
somethingorother(_bp);
}

Can this be changed to an attribute as part of the invoked prototype so it can auto marshal the array to a pointer ? I was reading http://blogs.msdn.com/ericgu/archive/2004/07/13/181897.aspx and they mentioned some possible change, although this was an idea for whidby so much use for me, as mentioned in that thread, when you're dealing with multiple paramaters it can yield some fugly code (doing it as I do now).

Cheers.
Quote:Original post by Niksan2
Another one for Washu I guess, as per my earlier post on the method I use, mainly because I have to expose a byte* to an unmanaged dll (the only time I use pointers, and maybe that's what the OP wanted it for), for the unsafe bandwagoners :D, the question being I currently do it like so.

fixed (byte *_bp = byteArray)
{
somethingorother(_bp);
}

Can this be changed to an attribute as part of the invoked prototype so it can auto marshal the array to a pointer ? I was reading http://blogs.msdn.com/ericgu/archive/2004/07/13/181897.aspx and they mentioned some possible change, although this was an idea for whidby so much use for me, as mentioned in that thread, when you're dealing with multiple paramaters it can yield some fugly code (doing it as I do now).

Cheers.

Take a look at the MarshalAsAttribute, especially with UnmanagedType=LPArray.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Ah great stuff, don't suppose you know off hand if the marshal would be any slower ? most cases i'm doing a singular call, but in certain places I have to do several thousand, if it's negligible it doesn't really matter.

Thanks again.

This topic is closed to new replies.

Advertisement