David_pb

Members
  • Content count

    53
  • Joined

  • Last visited

Community Reputation

1046 Excellent

About David_pb

  • Rank
    Member
  1. It's maybe worth to add, that the root signature has a limit of 64 DWORDs. So it can easily overflow the 64 byte limit GCN hardware sets for user data registers. Therefore it's a good idea to keep the root signature size small (at least on this hardware). Otherwise parts are spilled in 'slower' memory and a additional indirection is needed for resolve.
  2. Hi,   In general there is no state inheritance between two different direct command lists. It is expected that the full set of states (PSO and non-PSO states) is recorded within a direct command list. There are some options to ease this a bit, initial PSO states can be specified with the pInitialState parameter at creation time (CreateCommandList), if non is specified a default PSO is used instead. All non-PSO states need to be recorded (RSSetViewports, IASet***Buffer, OMSetRenderTargets, ...).    With this in mind, you can try to adapt the way you record the command lists to it. If you split rendering of many objects (of the same render-pass) in different command lists you need to specify the render-states per command list, even though it might seem redundant. So your example from above would sadly not produce the intended behavior.    For bundles the set of rules are of course a bit different. For more info read [url=https://msdn.microsoft.com/de-de/library/windows/desktop/dn899196(v=vs.85).aspx#Graphics_pipeline_state_inheritance]Graphics pipeline state inheritance[/url]
  3. DirectX evolution

      Also, not to forget in this regard, the technique(s) used in AC and Trials as presented last Siggraph (GPU-driven Rendering Pipelines)
  4. DX12 DirectX 12 issues

    Hi DarkRonin,   you might want to take a look at this thread: http://www.gamedev.net/topic/666986-direct3d-12-documentation-is-now-public/. There are floating a couple of simple example codes around. 
  5. Hi DarkRonin,   note that the first parameter of CreateSwapChain expects a command-queue not the device (even though the parameter name suggests this). Also note you should pass a DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL in the SwapEffect field of your description (and therefore use 2 buffers). Also you should definitely specify the sample desc (like count=1, quality=0).   Also try to enable the debug layer, so validation errors are logged directly to the visual studio output window: ComPtr<ID3D12Debug> debugInterface; if (SUCCEEDED(D3D12GetDebugInterface(IID_PPV_ARGS(&debugInterface)))) { debugInterface->EnableDebugLayer(); }
  6. Ocean Wave 'Fake' SSS

    Look's nice, Could you provide the shader code though? Maybe someone has a idea for improvement?
  7. DX11 C++ error

    Oh, ok.. I see! :) But check your result values anyway.
  8. DX11 C++ error

    Maybe D3D11CreateDeviceAndSwapChain failed and your device context is not initialized properly. You should definitely check your result values (i.e.): HRESULT hr = D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, NULL, NULL, NULL, D3D11_SDK_VERSION, &scd, &pSwapChain, &pDevice, NULL, &pDeviceContext); if (FAILED(hr)) // do your error handling
  9. Particles Rotation

    You could just set up to UNIT_Y (some pseudocode): view = normalize(cameraOrigin - particleOrigin); if (sphericalBillboard) // whenever the particle normal should truly face the camera { vertical = cameraUpVector; } else { vertical = Vector3::UNIT_Y; } horizontal = cross(view, vertical); Keep in mind to do the calculation in the right space (i.e. worldspace) and to adjust to your coordinate system.
  10. The BUFFER_DESC is fairly standard: const bool isStatic = (flags & CBF_STATIC_BUFFER) != 0; D3D11_BUFFER_DESC desc; desc.ByteWidth = size; // size is already multiple of 16 here desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER; desc.CPUAccessFlags = 0; if (isStatic) { desc.Usage = D3D11_USAGE_IMMUTABLE; } else { desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE; desc.Usage = D3D11_USAGE_DYNAMIC; } desc.MiscFlags = 0; desc.StructureByteStride = 0; D3D11_SUBRESOURCE_DATA data; data.pSysMem = p_Data; data.SysMemPitch = 0; data.SysMemSlicePitch = 0; HRESULT hr; ID3D11Buffer* buffer; hr = device->CreateBuffer(&desc, isStatic ? &data : 0, &buffer); //... I can't provide the actual update code, since it's to deeply integrated in the engine. But what basically happens is that the buffers which are marked for update are mapped, the memory chunk is copied via memcpy and the buffers are unmapped afterwards. Yes, I'm aware of that. But the context can be used still from many threads, although the access needs to be synchronized manually. I thought maybe someone here does have some knowledge with this.
  11. Hi,   currently I'm rethinking the way I handle shader constants in our engine. What I currently do is holding a local backing store for each constant buffer which gets filled by the shader constant provider(s). After all constants are assembled the constant buffers are mapped and the hole memory junk is simply copied via memcpy. Additionally I'm doing other stuff, to keep the number of updates as low as possible (sharing buffers, packing constants by update frequence).   This seems to be not efficient though, the GPU seems to keep renaming buffers what stalls the CPU far to often. I thought about doing the update asynchronous so other operations can be done during the update. It now happens that the device context is not multi thread safe, so the synchronization must be done by myself. Does anyone have experience with this topic? Or maybe I'm doing it all wrong and somebody can give me a hint.   Cheers
  12. DX11 DirectX11 performance problems

    For release builds no flags are set when shaders are compiled, except for matrix order. For debug I use DEBUG, PREFRER_FLOW_CONTROL and SKIP_OPTIMIZATION. As for the InputLayouts. I use a simple caching system to share inputlayouts whenever possible. Whenever a shader is associated with some renderable entity a IL is requested from the cache. A hash is created over the vertex declaration and the shader input signature and if an equal hash is found the layout is returned and shared. Otherwise a new layout is created. This all happens only once per shader-renderable entity combination at loading time, so I assure not to create this stuff 'on the fly' at runtime.
  13. DX11 DirectX11 performance problems

    @Adam_42, mhagain   Thank you, this is useful information. I've checked the initialization code and all seems to be fine. What I found though is a bug in the code that sorts the render-operations so my batching was far from optimal. With correct rop order and many checks to avoid unnecessary API calls the performance is now quite decent, though not really optimal. Interestingly DirectX9 doesn't seem to have much trouble with this...
  14. DX11 DirectX11 performance problems

    Thanks for the answer Adam, As for the first question, this is hard to say, in the worst case the functions are called round 2000-4000 times per frame. I actually thought about multi threading, but dropped the idea since I'm heavily bound to the available interface and there is currently no time to do bigger changes there. But maybe in the future this could be an option. Thanks for the interesting link though, I'll check this.