Well there's a whole lot more to the assembler optimizations you could do that the compiler has no chance of figuring out (I sort of lied/simplified :-))
Check out: http://www.d6.com/users/checker/misctech.htm
Particularly the last one, and the extra article on floating point optimizations.
Needless to say, there's a whole slew of extra performance to be gained by using clever low-level optimizations which I have not done at all so far. A ~3x performance difference in that light really is quite surprising.
[.net] memset?
I should reiterate that the 3x number really is just a fifteen second ballpark-benchmark. It may be 2x it may be 5x, but it's not 10x (which would've been about what I had expected).
for zero'ing memory there are methods in Marshal to do that (well from memory there are) and also Array.Clear(...), etc.
I have a trouble with importing memset... I do exactly what is written in the code above:
impoting:
and using memset:
and I get that result:
0
237820880
1002897798
1657007234
instead of four zeroes.
What is wrong?
impoting:
class ExtrnCalls { [DllImport("msvcrt")] public static unsafe extern void memset(int[] dest, int c, int count); }
and using memset:
Random rnd = new Random(1); int[] x = new int[4]; for (int i = 0; i < x.Length; i++) { x = rnd.Next(); } ExtrnCalls.memset(x, 0, x.Length); foreach (int i in x) Console.WriteLine(i);
and I get that result:
0
237820880
1002897798
1657007234
instead of four zeroes.
What is wrong?
The count parameter is wrong, it should be the amount of bytes (not array elements) you want to change. Since int is four bytes, you only set the first element in the array.
Correct is:
Correct is:
ExtrnCalls.memset(x, 0, x.Length * Marshal.SizeOf(typeof(int)));
Quote:Original post by unbird
The count parameter is wrong, it should be the amount of bytes (not array elements) you want to change. Since int is four bytes, you only set the first element in the array.
Correct is:ExtrnCalls.memset(x, 0, x.Length * Marshal.SizeOf(typeof(int)));
Ok. But if I change "character to set" from zero to 254 I get this:
int[] x = new int[4];ExtrnCalls.memset(x, 254, x.Length * 4);Out:-16843010 -16843010-16843010-16843010
for 1 I get this:
16843009 (=1000000010000000100000001)
16843009 (=1000000010000000100000001)
16843009 (=1000000010000000100000001)
16843009 (=1000000010000000100000001)
as follows from the result memset is working only with bytes, but I remember that in C it was possible to set int's.
Quote:...memset is working only with bytes...
It is in fact so, and according to the Windows SDK doc I don't find anything for your purpose.
If you want to support any type, a generic method might be useful.
public static void SetAll<T>(T[] array, T value) { for (int i = 0; i < array.Length; i++) array = value; }
As already stated, I also think you gain little speed, if at all. You really need big arrays before you experience a gain.
I don't think your dllimport is correct. AFAIK, size_t translates to IntPtr, not int (otherwise you'll corrupt your stack on 64bit runtimes). Try this:
Usage:
This can be made cross-platform using a dllmap to map msvcrt to the correct shared object on Linux, OSX or any other platform.
// C definition: void * memset ( void * ptr, int value, size_t num );[DllImport("msvcrt")]public static extern IntPtr memset(int[] dest, int c, IntPtr count);// or this[DllImport("msvcrt")]public static unsafe extern void* memset(void* dest, int c, IntPtr count);
Usage:
int[] x = new int[4];memset(x, 0, new IntPtr(4)); // first versionunsafe { fixed (int* x_ptr = x) { memset((void*)x_ptr, 0, new IntPtr(4)); // second version }}
This can be made cross-platform using a dllmap to map msvcrt to the correct shared object on Linux, OSX or any other platform.
Quote:Original post by unbirdQuote:...memset is working only with bytes...
It is in fact so, and according to the Windows SDK doc I don't find anything for your purpose.
If you want to support any type, a generic method might be useful.
*** Source Snippet Removed ***
As already stated, I also think you gain little speed, if at all. You really need big arrays before you experience a gain.
Actually, the speed gain is far from trivial. Try benchmarking your loop as-is and then unrolled at steps of 16. The latter will have noticeably better performance than the former.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement