While this is still here...

posted in NO
Published August 29, 2008
Advertisement
So while my journal is still active (I've cancelled my gamedev subscription, please don't ask why). I thought I'd post this to you optimization experts. Namely those of you doing .NET optimization (if anyone actually does that?):

So. Here's what's going on, I've got a little memory buffer object and I want to make it as fast as I can manage in .NET.

I've thrown together a this little benchmark program that you see in the picture above. Basically it runs 3 tests, each test writes 500,000 vertices @ 56 bytes per vertex to the memory buffer and each write is repeated 1000 times. The 3 tests are as follows:
1.  Write using GC.Alloc to pin the source vertex array and the destination byte array for every iteration of the 1000 iteration loop and use unsafe code to copy the data.2.  Pin the arrays outside of the 1000 iteration loop, but essentially the same thing as test 1.3.  Prepare the vertex data as an array of 28,000,000 bytes (56 bytes * 500,000) and use Buffer.BlockCopy.

As you can see, it's not exactly swift and I expect these numbers are probably not as good as they could be, especially compared to C++. You'll note that the pinning of the arrays per iteration of the loop really has very little impact on my time compared to pinning once. What I was surprised at was the difference in time between the Buffer.BlockCopy and my own code. 4 seconds seems like a lot. I'm not sure if given the amount of data and iterations that these numbers are good or bad or if the difference between 1st and 3rd tests are really that big of a deal. Thus I want some expert opinions on this.

I've never been good with optimization. In fact I'm horrible at it, so there's a very high probability that I'm not doing something that I should be to improve performance.

Here's the Writing code:
public override void Write(T[] data, int startIndex, int count){	int dataSize = Marshal.SizeOf(typeof(T)) * count;	GCHandle srcArrayHandle = GCHandle.Alloc(data, GCHandleType.Pinned);	try	{		Write(Marshal.UnsafeAddrOfPinnedArrayElement(data, startIndex), dataSize);	}	finally	{		srcArrayHandle.Free();	}}// Note: _lockPointer is an IntPtr to a pinned // destination byte array and is set up when the // Lock() function is called public override void Write(IntPtr pointer, int count){	unsafe				{						byte* src = (byte*)pointer.ToPointer();		byte* dest = (byte*)_lockPointer.ToPointer();		MemCopy(src, dest, count);	}}

and here's the memcopy code:
private unsafe void MemCopy(byte *src, byte *dest, int count){	if (count >= 16)	{		do		{			if (IntPtr.Size == 4)			{				int* intSrc = (int*)src;				int* intDest = (int*)dest;				*intDest = *intSrc;				*(intDest + 4) = *(intSrc + 4);				*(intDest + 8) = *(intSrc + 8);				*(intDest + 12) = *(intSrc + 12);			}			else			{				long* longSrc = (long*)src;				long* longDest = (long*)dest;				*longDest = *longSrc;				*(longDest + 8) = *(longSrc + 8);			}			src += 16;			dest += 16;			count -= 16;		} while (count >= 16);	}	if ((count & 8) != 0)	{		if (IntPtr.Size == 4)		{			int* intSrc = (int*)src;			int* intDest = (int*)dest;			*intDest = *intSrc;			*(intDest + 4) = *(intSrc + 4);		}		else		{			long* longSrc = (long*)src;			long* longDest = (long*)dest;			*longDest = *longSrc;		}		src += 8;		dest += 8;	}	if ((count & 4) != 0)	{		int* intSrc = (int*)src;		int* intDest = (int*)dest;		*intDest = *intSrc;		src += 4;		dest += 4;	}	if ((count & 2) != 0)	{		short* shortSrc = (short*)src;		short* shortDest = (short*)dest;		*shortDest = *shortSrc;		src += 2;		dest += 2;	}	if (count == 1)	{		*dest = *src;		src++;		dest++;	}}

Note that here I try to force it to use long when I'm on a 64 bit system (which I am). I don't know if this would be a performance boost or not, didn't really seem to make much a difference when I took it out, so I left it in.

Feel free to chime in with any advice on how to make this faster or if you want to use this code, then by all means have at it.
0 likes 2 comments

Comments

benryves
Using Marshal.Copy in place of MemCopy boosts performance fractionally here (00:00:25.9380000 for MemCopy, 00:00:23.9790000 for Marshal.Copy on 1000 copies of a 32MB array). Not much to write home about, but it at least removes the need for unsafe code!
August 29, 2008 05:36 AM
Tape_Worm
Quote:Original post by benryves
Using Marshal.Copy in place of MemCopy boosts performance fractionally here (00:00:25.9380000 for MemCopy, 00:00:23.9790000 for Marshal.Copy on 1000 copies of a 32MB array). Not much to write home about, but it at least removes the need for unsafe code!


Thanks, I hadn't thought about using Marshal.Copy.

I updated the benchmark to use Marshal.Copy and my results were slightly more than a fractional increase [grin]:

That's pretty awesome.
August 29, 2008 09:22 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement

Latest Entries

Gorgon 3.2

10167 views

Gorgon v3.1.46.255

3362 views

Gorgon v3.1.45.248

2992 views

Gorgon v3.1.29.243

4278 views

Gorgon v3.1

4102 views

Gorgon Update #10

3004 views

Gorgon Update #9

3204 views

Gorgon Update #8

2945 views

Gorgon Update #7

3168 views

v3.0 Release

3724 views
Advertisement