Jump to content

  • Log In with Google      Sign In   
  • Create Account

FREE SOFTWARE GIVEAWAY

We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.


Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


David_pb

Member Since 13 Nov 2008
Offline Last Active Mar 27 2014 06:52 AM

#5075523 DirectX11 performance problems

Posted by David_pb on 05 July 2013 - 01:20 PM

Hi,

 

I'm currently working on a DirectX11 port of a old DX9 renderer and facing the problem that DX11 seems really slow in comparison to the old stuff. I've already checked all the 'best practice' slides which are available all around the internet (such as creating new resources at runtime, updatefrequency for constantbuffers, etc...). But nothing seems to be a real problem. Other engines i checked are much more careless in most of this cases but seem not to have likewise problems.

 

Profiling results that the code is highly CPU bound since the GPU seems to be starving. GPUView emphasizes this since the CPU Queue is empty most of the time and becomes only occasionally a package pushed onto. The wired thing is, that the main thread isn't stalling but is active nearly the whole time. Vtune turns out that most of the samples are taken in DirectX API calls which are taking far to much time (the main bottlenecks seem to be DrawIndexed/Instanced, Map and IASetVertexbuffers). 

 

The next thing I thought about are sync-points. But the only source I can imagine is the update of the constant buffers. Which are quite a few per frame. What I'm essentially doing is caching the shader constants in a buffer and push the whole memory junk in my constant buffers. The buffers are all dynamic and are mapped with 'discard'. I also tried to create 'default' buffers and update them with UpdateSubresource and a mix out of both ('per frame' buffers dynamic and the rest default), but this seemed to result in equal performance.

 

The wired thing is, that the old DX9 renderer produces much better results with the same rendercode. Maybe somebody has experienced an equal behaviour and can give me a hint.

 

Cheers

David




#5057489 InputLayouts

Posted by David_pb on 28 April 2013 - 11:11 AM

Splitting vertex-data in multiple streams can be very handy if you don't want to always provide the full amount of vertex information. Let's say you want to do a z-prepass, you don't need information like normal/tangent, color etc... The proposed solution however can easily result in a huge number of simultaneous bound vertex streams and thus can badly impact IA fetch performance.




#5056439 InputLayouts

Posted by David_pb on 24 April 2013 - 02:05 PM

So, where does your shader signature require position? Are you sure you've the right shader bound to the pipeline while submitting the drawcall?




#5043625 D3D10CreateDeviceAndSwapChain fails in debug mode

Posted by David_pb on 16 March 2013 - 01:06 AM

Is your code still working in release builds? Otherwise this might be a related issue to this http://www.gamedev.net/topic/639532-d3d11createdevice-failed/




#5042693 Can I use an array in my vertex input layout?

Posted by David_pb on 13 March 2013 - 07:19 AM

Yes, this is possible. You can specify this in your input layout with the semantic index.


#5040851 CreateInputLayout - E_INVALIDARG error!

Posted by David_pb on 08 March 2013 - 09:41 AM


/* SETUP LAYOUT */
	//POSITION
	polygonLayout[0].SemanticName = "POSITION";
	polygonLayout[0].SemanticIndex = 0;
	polygonLayout[0].Format = DXGI_FORMAT_R32G32B32_FLOAT;
	polygonLayout[0].InputSlot = 0;
	polygonLayout[0].AlignedByteOffset = 0;
	polygonLayout[0].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA;
	polygonLayout[0].InstanceDataStepRate = 0;

	//TEXCOORD
	polygonLayout[1].SemanticName = "TEXCOORD";
	polygonLayout[1].SemanticIndex = 0;
	polygonLayout[1].Format = DXGI_FORMAT_R32G32_FLOAT;
	polygonLayout[1].InputSlot = 0;
	polygonLayout[1].AlignedByteOffset = 12;
	polygonLayout[1].InputSlot = D3D11_INPUT_PER_VERTEX_DATA;
	polygonLayout[1].InstanceDataStepRate = 0;

	//NORMAL
	polygonLayout[2].SemanticName = "NORMAL";
	polygonLayout[2].SemanticIndex = 0;
	polygonLayout[2].Format = DXGI_FORMAT_R32G32B32_FLOAT;
	polygonLayout[2].InputSlot = 0;
	polygonLayout[2].AlignedByteOffset = 20;
	polygonLayout[2].InputSlot = D3D11_INPUT_PER_VERTEX_DATA;
	polygonLayout[2].InstanceDataStepRate = 0;

	//TANGENT
	polygonLayout[3].SemanticName = "TANGENT";
	polygonLayout[3].SemanticIndex = 0;
	polygonLayout[3].Format = DXGI_FORMAT_R32G32B32_FLOAT;
	polygonLayout[3].InputSlot = 0;
	polygonLayout[3].AlignedByteOffset = 32;
	polygonLayout[3].InputSlot = D3D11_INPUT_PER_VERTEX_DATA;
	polygonLayout[3].InstanceDataStepRate = 0;

	//Count Elements in Layout
	numElements = sizeof(polygonLayout) / sizeof(polygonLayout[0]);

	//Create Layout for Vertex Shader
	r = device->CreateInputLayout(polygonLayout, numElements, VS_Buffer->GetBufferPointer(),
									VS_Buffer->GetBufferSize(), &m_layout);
	if(FAILED(r))
	{return false;}


D3D11_INPUT_PER_VERTEX_DATA must be assigned to InputSlotClass.


PARTNERS