Jump to content


Member Since 13 Nov 2008
Offline Last Active Mar 06 2017 08:57 AM

#5275615 DirectX 12 few notes and questions

Posted by David_pb on 14 February 2016 - 07:14 AM

To explain a bit more in-depth... AMD GCN hardware only allows you to pass 16 DWORD values (64 bytes) as the arguments to a shader. The root signature describes a structure that can fit into this 64byte area.

It's maybe worth to add, that the root signature has a limit of 64 DWORDs. So it can easily overflow the 64 byte limit GCN hardware sets for user data registers. Therefore it's a good idea to keep the root signature size small (at least on this hardware). Otherwise parts are spilled in 'slower' memory and a additional indirection is needed for resolve.

#5270007 DirectX evolution

Posted by David_pb on 08 January 2016 - 01:48 AM

But if for some odd reason you're batch limited even in D3D11/12, the general thinking from the batch, batch, batch presentation still holds (i.e. batch limited = increasing vertex count becomes free). For example to combat batching issues, Just Cause 2 introduced the term merge-instancing.


Also, not to forget in this regard, the technique(s) used in AC and Trials as presented last Siggraph (GPU-driven Rendering Pipelines)

#5234249 DirectX 12 issues

Posted by David_pb on 11 June 2015 - 04:19 AM

Hi DarkRonin,


you might want to take a look at this thread: http://www.gamedev.net/topic/666986-direct3d-12-documentation-is-now-public/. There are floating a couple of simple example codes around. 

#5234247 DirectX 12 problems creating swap chain

Posted by David_pb on 11 June 2015 - 04:14 AM

Hi DarkRonin,


note that the first parameter of CreateSwapChain expects a command-queue not the device (even though the parameter name suggests this). Also note you should pass a DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL in the SwapEffect field of your description (and therefore use 2 buffers). Also you should definitely specify the sample desc (like count=1, quality=0).


Also try to enable the debug layer, so validation errors are logged directly to the visual studio output window:

ComPtr<ID3D12Debug> debugInterface;
if (SUCCEEDED(D3D12GetDebugInterface(IID_PPV_ARGS(&debugInterface))))

#5075523 DirectX11 performance problems

Posted by David_pb on 05 July 2013 - 01:20 PM



I'm currently working on a DirectX11 port of a old DX9 renderer and facing the problem that DX11 seems really slow in comparison to the old stuff. I've already checked all the 'best practice' slides which are available all around the internet (such as creating new resources at runtime, updatefrequency for constantbuffers, etc...). But nothing seems to be a real problem. Other engines i checked are much more careless in most of this cases but seem not to have likewise problems.


Profiling results that the code is highly CPU bound since the GPU seems to be starving. GPUView emphasizes this since the CPU Queue is empty most of the time and becomes only occasionally a package pushed onto. The wired thing is, that the main thread isn't stalling but is active nearly the whole time. Vtune turns out that most of the samples are taken in DirectX API calls which are taking far to much time (the main bottlenecks seem to be DrawIndexed/Instanced, Map and IASetVertexbuffers). 


The next thing I thought about are sync-points. But the only source I can imagine is the update of the constant buffers. Which are quite a few per frame. What I'm essentially doing is caching the shader constants in a buffer and push the whole memory junk in my constant buffers. The buffers are all dynamic and are mapped with 'discard'. I also tried to create 'default' buffers and update them with UpdateSubresource and a mix out of both ('per frame' buffers dynamic and the rest default), but this seemed to result in equal performance.


The wired thing is, that the old DX9 renderer produces much better results with the same rendercode. Maybe somebody has experienced an equal behaviour and can give me a hint.




#5057489 InputLayouts

Posted by David_pb on 28 April 2013 - 11:11 AM

Splitting vertex-data in multiple streams can be very handy if you don't want to always provide the full amount of vertex information. Let's say you want to do a z-prepass, you don't need information like normal/tangent, color etc... The proposed solution however can easily result in a huge number of simultaneous bound vertex streams and thus can badly impact IA fetch performance.

#5056439 InputLayouts

Posted by David_pb on 24 April 2013 - 02:05 PM

So, where does your shader signature require position? Are you sure you've the right shader bound to the pipeline while submitting the drawcall?

#5043625 D3D10CreateDeviceAndSwapChain fails in debug mode

Posted by David_pb on 16 March 2013 - 01:06 AM

Is your code still working in release builds? Otherwise this might be a related issue to this http://www.gamedev.net/topic/639532-d3d11createdevice-failed/

#5042693 Can I use an array in my vertex input layout?

Posted by David_pb on 13 March 2013 - 07:19 AM

Yes, this is possible. You can specify this in your input layout with the semantic index.

#5040851 CreateInputLayout - E_INVALIDARG error!

Posted by David_pb on 08 March 2013 - 09:41 AM

	polygonLayout[0].SemanticName = "POSITION";
	polygonLayout[0].SemanticIndex = 0;
	polygonLayout[0].Format = DXGI_FORMAT_R32G32B32_FLOAT;
	polygonLayout[0].InputSlot = 0;
	polygonLayout[0].AlignedByteOffset = 0;
	polygonLayout[0].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA;
	polygonLayout[0].InstanceDataStepRate = 0;

	polygonLayout[1].SemanticName = "TEXCOORD";
	polygonLayout[1].SemanticIndex = 0;
	polygonLayout[1].Format = DXGI_FORMAT_R32G32_FLOAT;
	polygonLayout[1].InputSlot = 0;
	polygonLayout[1].AlignedByteOffset = 12;
	polygonLayout[1].InputSlot = D3D11_INPUT_PER_VERTEX_DATA;
	polygonLayout[1].InstanceDataStepRate = 0;

	polygonLayout[2].SemanticName = "NORMAL";
	polygonLayout[2].SemanticIndex = 0;
	polygonLayout[2].Format = DXGI_FORMAT_R32G32B32_FLOAT;
	polygonLayout[2].InputSlot = 0;
	polygonLayout[2].AlignedByteOffset = 20;
	polygonLayout[2].InputSlot = D3D11_INPUT_PER_VERTEX_DATA;
	polygonLayout[2].InstanceDataStepRate = 0;

	polygonLayout[3].SemanticName = "TANGENT";
	polygonLayout[3].SemanticIndex = 0;
	polygonLayout[3].Format = DXGI_FORMAT_R32G32B32_FLOAT;
	polygonLayout[3].InputSlot = 0;
	polygonLayout[3].AlignedByteOffset = 32;
	polygonLayout[3].InputSlot = D3D11_INPUT_PER_VERTEX_DATA;
	polygonLayout[3].InstanceDataStepRate = 0;

	//Count Elements in Layout
	numElements = sizeof(polygonLayout) / sizeof(polygonLayout[0]);

	//Create Layout for Vertex Shader
	r = device->CreateInputLayout(polygonLayout, numElements, VS_Buffer->GetBufferPointer(),
									VS_Buffer->GetBufferSize(), &m_layout);
	{return false;}

D3D11_INPUT_PER_VERTEX_DATA must be assigned to InputSlotClass.