Sign in to follow this  
hondo

slow present(), what am I doing wrong?

Recommended Posts

hondo    122
The user interface I'm developing needs to be able to run either fullscreen or in a window. Its still early days on this development, so its not actually doing much yet, but I'm finding the UI is not always animating smoothly when running in a Window. (though sometimes it is perfectly smooth) By adding a few debug statements in there, I can see the present() call takes either takes 16ms or 33ms, with 16ms looking smooth, and animation looking rough with 33ms. I realise this most likely means that it cant manage to present during the current vsync, so its having to wait for the second one. The thing I find strange is that its not really doing much though. At startup I'm creating the device, and half a dozen textures (about 200x100):
	HRESULT hr;
	m_pD3DDev = NULL;
	D3DDISPLAYMODE dm;

	CHECK_HR(m_pD3D->GetAdapterDisplayMode(D3DADAPTER_DEFAULT, &dm));

	D3DPRESENT_PARAMETERS pp;
	ZeroMemory(&pp, sizeof(pp));
	pp.BackBufferFormat = dm.Format;
	pp.hDeviceWindow = m_hWnd;
	if(!m_bFullScreen)
	{
		pp.BackBufferWidth = dm.Width;
		pp.BackBufferHeight = dm.Height;	
		pp.BackBufferCount = 1;
		pp.Windowed = TRUE;
		pp.SwapEffect = D3DSWAPEFFECT_COPY;
		pp.FullScreen_RefreshRateInHz = D3DPRESENT_RATE_DEFAULT;				
	}
	else
	{
		pp.Windowed = FALSE;
		pp.BackBufferCount = 2;
		pp.EnableAutoDepthStencil = FALSE;    
		pp.SwapEffect = D3DSWAPEFFECT_FLIP;
		pp.BackBufferWidth = dm.Width;
		pp.BackBufferHeight = dm.Height;	
		pp.FullScreen_RefreshRateInHz = D3DPRESENT_RATE_DEFAULT;		
		pp.PresentationInterval = D3DPRESENT_INTERVAL_ONE;		
	}

	DWORD dwBehaviour = D3DCREATE_SOFTWARE_VERTEXPROCESSING | D3DCREATE_MULTITHREADED;
	CHECK_HR(m_pD3D->CreateDevice(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, m_hWnd, dwBehaviour, &pp, &m_pD3DDev));

	m_pRenderTarget = 0;
	CHECK_HR(m_pD3DDev->GetRenderTarget(0, &m_pRenderTarget));


	CHECK_HR(m_pD3DDev->SetRenderState(D3DRS_CULLMODE, D3DCULL_NONE));
	CHECK_HR(m_pD3DDev->SetRenderState(D3DRS_LIGHTING, FALSE));
	CHECK_HR(m_pD3DDev->SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE));
	CHECK_HR(m_pD3DDev->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_SRCALPHA));
	CHECK_HR(m_pD3DDev->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_INVSRCALPHA));
	CHECK_HR(m_pD3DDev->SetRenderState(D3DRS_ALPHATESTENABLE, TRUE));
	CHECK_HR(m_pD3DDev->SetRenderState(D3DRS_ALPHAREF, 0x10));
	CHECK_HR(m_pD3DDev->SetRenderState(D3DRS_ALPHAFUNC, D3DCMP_GREATER));

	CHECK_HR(m_pD3DDev->SetSamplerState(0, D3DSAMP_ADDRESSU, D3DTADDRESS_CLAMP));
	CHECK_HR(m_pD3DDev->SetSamplerState(0, D3DSAMP_ADDRESSV, D3DTADDRESS_CLAMP));
	CHECK_HR(m_pD3DDev->SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_LINEAR));
	CHECK_HR(m_pD3DDev->SetSamplerState(0, D3DSAMP_MINFILTER, D3DTEXF_LINEAR));
	CHECK_HR(m_pD3DDev->SetSamplerState(0, D3DSAMP_MIPFILTER, D3DTEXF_LINEAR));

	CHECK_HR(m_pD3DDev->CreateVertexBuffer( sizeof(m_vertices), D3DUSAGE_WRITEONLY, D3DFVF_CUSTOMVERTEX,D3DPOOL_MANAGED, &m_pVertexBuf.p, NULL));




Each time I draw a frame, I do approximately the following.
	CHECK_HR(m_pD3DDev->BeginScene());

	CHECK_HR(m_pD3DDev->SetTextureStageState(0, D3DTSS_ALPHAOP, D3DTOP_MODULATE));
	CHECK_HR(m_pD3DDev->SetTextureStageState(0, D3DTSS_ALPHAARG1, D3DTA_TEXTURE));
	CHECK_HR(m_pD3DDev->SetTextureStageState(0, D3DTSS_ALPHAARG2, D3DTA_DIFFUSE));
	CHECK_HR(m_pD3DDev->SetTextureStageState(0, D3DTSS_COLORARG1, D3DTA_TEXTURE));

	// draw render list
	for (int i=0; i<renderList->Count; i++)
	{		
		D3DLOCKED_RECT rcLock;

		UiElement *uiElement = (UiElement *)renderList->get_Item(i);
		if (uiElement->texture != NULL)
		{			
			m_vertices[0].position = CUSTOMVERTEX::Position(tempRect.left, tempRect.top,  0.0f); // top left
			m_vertices[1].position = CUSTOMVERTEX::Position(tempRect.left, tempRect.bottom, 0.0f); // bottom left
			m_vertices[2].position = CUSTOMVERTEX::Position(tempRect.right,  tempRect.top, 0.0f); // top right
			m_vertices[3].position = CUSTOMVERTEX::Position(tempRect.right, tempRect.bottom, 0.0f); // bottom right


			CHECK_HR(m_pVertexBuf->Lock(0,sizeof(pData), &pData, 0));
			memcpy(pData, m_vertices, sizeof(m_vertices));                            
			CHECK_HR(m_pVertexBuf->Unlock());  

			CHECK_HR(m_pD3DDev->SetTexture(0, (IDirect3DBaseTexture9 *)uiElement->texture.ToPointer()));			
			CHECK_HR(m_pD3DDev->SetStreamSource(0, m_pVertexBuf, 0, sizeof(CAllocatorPresenter::CUSTOMVERTEX)));            
			CHECK_HR(m_pD3DDev->SetFVF(D3DFVF_CUSTOMVERTEX));
			CHECK_HR(m_pD3DDev->DrawPrimitive(D3DPT_TRIANGLESTRIP, 0, 2));
		}
	}

	// end of scene
	CHECK_HR(m_pD3DDev->EndScene());

        Logger::Verbose("presenting@1");
	hr = m_pD3DDev->Present(0, 0, 0, 0);
        Logger::Verbose("presenting@2"); 




So effectively, BeginScene(), Clear(), draw five textures EndScene(), then Present(). Am I missing something obvioius? Surely I should be do this to draw five simple textures to the screen at 60fps? In case its relevent, I'm doing this on Vista using DirectX9. Sometimes I start the app and its perfectly smooth, other times it does this slow/jerky thing. Sometimes I can switch another window to the foreground, then switch back and it starts going smoothly.

Share this post


Link to post
Share on other sites
jollyjeffers    1570
Have you read through the 'Accurately profiling Direct3D API calls' in the SDK documentation?

Gathering performance statistics like you have is not as straight-forward as you might initially think [wink] The GPU is effectively a parallel co-processor, so timing the application side is simply timing how long it takes to set things up rather than how long it takes the GPU to actually do the work.

If the command queue fills up then the application has to be blocked before it can issue more commands. Blocking on Present() is a common symptom of this - you're getting too far ahead and the API is basically saying "hold up a sec, let me catch up!"

Having said all that, you will be causing a nasty lock-step situation with your lock-modify-unlock-render use of m_pVertexBuf so it's quite possible you're not really getting any parallelism...

Have you tried different presentation intervals or checking that the driver isn't overriding this setting?

hth
Jack

Share this post


Link to post
Share on other sites
Locking a static VB over and over is bad (The D3D debug runtimes will spew warnings about this if you turn them on). Switch to a dynamic VB.

Drawing each button one at a time is slow. Batch together many items, copy them all into the VB at once, and draw them in one draw call.

The "D3DLOCKED_RECT rcLock" may just be left-over fluff, or perhaps you're not showing us the part where you're locking a surface. This will slow things to a crawl too.

Enabling alphatest all the time will disable many Z culling optimizations.
Enabling alphablend all the time will incur a small (say 10%) speed penalty on draws where it's not needed.

Share this post


Link to post
Share on other sites
hondo    122
Thanks guys - this is exactly the sort of feedback I was looking for. First stop will be looking into dynamic vertex buffers... (I'm bit of bunny - still new to Direct3D).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this