Followers 0

# [.net] memset?

## 20 posts in this topic

Hey! Is there a way to replicate the functionality of "memset" in C#? I'm doing some unsafe hacking and need to zero out a block of memory very quickly. Thanks!
0

##### Share on other sites
Not going to happen.

People don't use C# for its runtime performance.
0

##### Share on other sites
Well there are pointers and etc. to get runtime performance where it's needed, so I figured that if I'm allowed to work with raw buffers, maybe there exists a fast way to zero a buffer.
0

##### Share on other sites
Maybe you can do it with p/invoke, see the end of this discussion:
http://www.gamedev.net/community/forums/topic.asp?whichpage=1&pagesize=25&topic_id=389036

("People don't use C# for its runtime performance." Hmm. C#'s killer feature isn't speed, but that doesn't mean we shouldn't look for ways to optimize.)
0

##### Share on other sites
Unless there's a specific .net framework function do do it, you could just use memset.

        [DllImport("msvcrt.dll")]        private static unsafe extern void memset(int[] dest, int c, int count);         private void button1_Click(object sender, EventArgs e)        {            int[] x = new int[10];            for (int i = 0; i < x.Length; ++i) x[i] = 10;            unsafe            {                memset(x, 0, 4);            }            label1.Text = x[0].ToString();        }
0

##### Share on other sites
Thanks, I'll use the standard memset for now. Does that dll come with windows or .net or both? It kinda sucks that it won't be portable though.

Regarding the speed of C#, I must say that so far I'm quite impressed. I'm writing a software renderer and although it's not a completely a fair comparison I think it fares pretty well against Quake. Most of the time is spent in the rasterizer for both engines (>90%) so testing by just looking at a wall, filling the screen, I get that quake is about 3x faster than my version. But quake gets at least a factor of two speedup by using an optimized assembler version of the inner rasterizer loop (basically utilizing that pentium processors and later can do floating point adds in parallell if you don't fetch the results right away), and quake also uses a subdivided affine perspective correction (only computing the perspective correction every 8 pixels, IIRC), whereas my version does a completely accurate perspective correction. Also I use 32 bit textures and frame-buffer so that might work in quake's favour as well. Overall I think C# is very close to C in speed for this application (which is a rather extreme case for the 80/20 rule, admittedly), most of the speed difference is due to me, not C#.

I wrote it in C# to test out the concepts for fun, never really thinking that it would be this fast, but I've been quite impressed so far! Everything is written in the most naive and obvious way and it's still very fast compared to one of the fastest rasterizers around.
0

##### Share on other sites
I'd expect the overhead of the p/invoke is going to negate any speed increase you may see (although I'm really not sure you would actually see any speed up). This is especially true if you don't tag the declatation with:

[System.Security.SuppressUnmanagedCodeSecurity]

if you don't do this, .net will do a stack check after each call to make sure the stack is still intact (well, that is my understanding of what this supresses). Needless to say, it'd be *much* slower without this in your case.

It's impressive you the speed you claim for your ray tracer. I've written a couple of little ray tracers before in C# and C++ at the same time (for comparison) and it ended up that C# managed the complex ray tracer better, while C++ managed the linear time one better (as expected). It was still interesting however. Surprisingly, the linear ray tracer with C++ floating point accuracy optimisations off was almost exactly the same speed as the C# one. I can't remember if this was .net 1.1 or 2.0 though.
0

##### Share on other sites
Well it wasn't a ray-tracer, it was a rasterizer.
I've been pleasantly surprised about many things in C#, speed-wise. For instance, I was sure that garbage collection was slower than manual memory management, but if you benchmark it you'll see that heap allocation is about twenty times faster in a garbage collected language. This really shouldn't be surprising if you know how garbage collection works (allocation in C# is a pointer incrementation with a check, roughly five instructions, whereas in C++ it's a huge mess of best-fit/first-fit algorithms, roughly a hundred instructions) but somehow I had bought in to the myth of the slowness of garbage collection without benchmarking it (bad programmer, BAAAD programmer!!!).
At any rate, C++ still gains a lot due to less runtime checks (out-of-bounds, overflow etc. -- which you can turn off!) and the fact that C++ programs typically allocate on the stack more often, but it's surprising that C# programs fare so incredibly well in comparison!

About the memset, it turns out that using an unchecked loop and setting the memory one word at a time (rather than one byte at a time as in memset?) is a little bit faster than using P/Invoke.
0

##### Share on other sites
fantastic.
It will be interesting to see what you have created. *hint hint* [wink]

Looking back I should have proof read my previous post :-) *sigh*
0

##### Share on other sites
Quote:
 Original post by sebastiansylvanBut quake gets at least a factor of two speedup by using an optimized assembler version of the inner rasterizer loop (basically utilizing that pentium processors and later can do floating point adds in parallell if you don't fetch the results right away)

Assembly isn't required for that. The compiler (includnig the C# compiler) and CPU will both attempt to schedule the instructions to make this possible.
0

##### Share on other sites
Well there's a whole lot more to the assembler optimizations you could do that the compiler has no chance of figuring out (I sort of lied/simplified :-))

Check out: http://www.d6.com/users/checker/misctech.htm

Particularly the last one, and the extra article on floating point optimizations.

Needless to say, there's a whole slew of extra performance to be gained by using clever low-level optimizations which I have not done at all so far. A ~3x performance difference in that light really is quite surprising.
0

##### Share on other sites
I should reiterate that the 3x number really is just a fifteen second ballpark-benchmark. It may be 2x it may be 5x, but it's not 10x (which would've been about what I had expected).
0

##### Share on other sites
You could use the ZeroMemory method in kernel32.
0

##### Share on other sites
for zero'ing memory there are methods in Marshal to do that (well from memory there are) and also Array.Clear(...), etc.
0

##### Share on other sites
I have a trouble with importing memset... I do exactly what is written in the code above:

impoting:

class ExtrnCalls    {        [DllImport("msvcrt")]        public static unsafe extern void memset(int[] dest, int c, int count);    }

and using memset:

            Random rnd = new Random(1);            int[] x = new int[4];            for (int i = 0; i < x.Length; i++)            {                x[i] = rnd.Next();            }            ExtrnCalls.memset(x, 0, x.Length);            foreach (int i in x)                Console.WriteLine(i);

and I get that result:

0
237820880
1002897798
1657007234

What is wrong?
0

##### Share on other sites
The count parameter is wrong, it should be the amount of bytes (not array elements) you want to change. Since int is four bytes, you only set the first element in the array.

Correct is:

     ExtrnCalls.memset(x, 0, x.Length * Marshal.SizeOf(typeof(int)));
0

##### Share on other sites
Quote:
 Original post by unbirdThe count parameter is wrong, it should be the amount of bytes (not array elements) you want to change. Since int is four bytes, you only set the first element in the array.Correct is: ExtrnCalls.memset(x, 0, x.Length * Marshal.SizeOf(typeof(int)));

Ok. But if I change "character to set" from zero to 254 I get this:

int[] x = new int[4];ExtrnCalls.memset(x, 254, x.Length * 4);Out:-16843010 -16843010-16843010-16843010

for 1 I get this:

16843009 (=1000000010000000100000001)
16843009 (=1000000010000000100000001)
16843009 (=1000000010000000100000001)
16843009 (=1000000010000000100000001)

as follows from the result memset is working only with bytes, but I remember that in C it was possible to set int's.
0

##### Share on other sites
Quote:
 ...memset is working only with bytes...

It is in fact so, and according to the Windows SDK doc I don't find anything for your purpose.

If you want to support any type, a generic method might be useful.

    public static void SetAll<T>(T[] array, T value)    {        for (int i = 0; i < array.Length; i++)            array[i] = value;    }

As already stated, I also think you gain little speed, if at all. You really need big arrays before you experience a gain.
0

##### Share on other sites
I don't think your dllimport is correct. AFAIK, size_t translates to IntPtr, not int (otherwise you'll corrupt your stack on 64bit runtimes). Try this:
// C definition: void * memset ( void * ptr, int value, size_t num );[DllImport("msvcrt")]public static extern IntPtr memset(int[] dest, int c, IntPtr count);// or this[DllImport("msvcrt")]public static unsafe extern void* memset(void* dest, int c, IntPtr count);

Usage:
int[] x = new int[4];memset(x, 0, new IntPtr(4)); // first versionunsafe {    fixed (int* x_ptr = x) {        memset((void*)x_ptr, 0, new IntPtr(4)); // second version    }}

This can be made cross-platform using a dllmap to map msvcrt to the correct shared object on Linux, OSX or any other platform.
0

##### Share on other sites
Quote:
Original post by unbird
Quote:
 ...memset is working only with bytes...

It is in fact so, and according to the Windows SDK doc I don't find anything for your purpose.

If you want to support any type, a generic method might be useful.

*** Source Snippet Removed ***

As already stated, I also think you gain little speed, if at all. You really need big arrays before you experience a gain.

Actually, the speed gain is far from trivial. Try benchmarking your loop as-is and then unrolled at steps of 16. The latter will have noticeably better performance than the former.
0

##### Share on other sites
I've wrote the fuction which fills memory by 4 bytes in C++ and imported it. But performance still leaves much to be desired.
0

## Create an account

Register a new account