Archived

This topic is now archived and is closed to further replies.

mr_jrt

Fastest Way to clear surface

Recommended Posts

mr_jrt    134
Just doing some profiling comparisons between filling a surface with a full-surface colourfill blit and using memcpy. I''m getting approx 10-18ms for this code (Note: My machine''s an old Cyrix 6x86 150+ with a 4MB S3 PCI card)
    
##Profile avg. of 10-20 ms
int Draw::FillSurface( COLORREF Colour, LPDIRECTDRAWSURFACE4 *Surface)
{
	if (!Surface) Surface = DDSBack;

	HRESULT			hResult;
	DDSURFACEDESC2		SurfaceDescription;
	SurfaceDescription.dwSize = sizeof(DDSURFACEDESC2);

	//Lock the surface so we can fiddle with the memory directly

	hResult = Surface->Lock(NULL, &SurfaceDescription, DDLOCK_WAIT, NULL);
	ERRORCHECK("FillSurface - Lock", hResult);

	memset(
		SurfaceDescription.lpSurface,
		0,
		SurfaceDescription.dwHeight * SurfaceDescription.lPitch
		);
		
	hResult = Surface->Unlock(NULL);
	ERRORCHECK("FillSurface - Unlock", hResult);

	return 0;
}

##Profile avg. of 7-10 ms
int Draw::FillSurface( COLORREF Colour, LPDIRECTDRAWSURFACE4 *Surface)
{
	if (!Surface) Surface = DDSBack;

	HRESULT			hResult;
	DDBLTFX		BltFX;

	// clear out the structure and set the size field 

	ZeroMemory(&BltFX, sizeof(DDBLTFX));
    BltFX.dwSize = sizeof(DDBLTFX);
    
	// set the dwfillcolor field to the desired color

	BltFX.dwFillColor = RGBColour; 

	// ready to blt to surface

	hResult = Surface->Blt(NULL, NULL, NULL, DDBLT_COLORFILL | DDBLT_WAIT, &BltFX);
	ERRORCHECK("FillSurface Blt", hResult);

	return 0;
}

    
I''m guessing the increased speed using DDSurface->Blt() is from hardware-accelerated blitting, but still, surely I could get this even faster using ASM (e.g. with MMX etc). Using ASM isn''t a major problem, as I''ve used it before. It''s just more work. Ideas on getting this any faster? Waassaap!!

Share this post


Link to post
Share on other sites
neocron    122

I think it depends on where your surface is. If the surface is in video-card memory Blt() will be faster than memset(). If it''s in system memory I would think they''d be similar.

An MMX/floating point memcpy/memset would get you faster system-to-system and possibly system-to-video transfers if the destination memory is not currently in the cache.

Share this post


Link to post
Share on other sites
mr_jrt    134
Although I use it as a general function, the profiling runs are measuring the code when used on the primary/backbuffer surfaces.

It may just be the case that because little else is happening in the main code other than clearing the screen (using this function) and re-rendering some text, and that is why the function is using such a huge percentage of the running time (usually between 45-60%).

Alternatevly, it may be that I'm the greatest code around and my other code is so fast it makes the compiler libs to shame.

Then again, probably almost certainly not the case.

-Jamie



Waassaap!!

Edited by - mr_jrt on October 3, 2000 8:19:11 AM

Share this post


Link to post
Share on other sites