Jump to content
  • Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

3381 Excellent

About ajmiles

  • Rank

Personal Information

  • Role
  • Interests


  • Twitter
  • Steam

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. The number of Append Buffers isn't really a problem, it's more a question of what the code looks like when you Append to them. If, for example, you had a load of code that calculated a number between 0-29 and then proceeded to do this: switch(value) { case 0: appendBuffer0.Append(thing); break; case 1: appendBuffer1.Append(thing); break; .... case 29: appendBuffer29.Append(thing); break; } Then I'd be very concerned about the resulting performance. However, if whatever you're implementing naturally has 30 separate lists and the appending of items to those lists isn't naturally divergent or conditional then I would be less concerned about having 30 Append Buffers. Can you give us any more insight into what you're trying to implement and perhaps a little bit of code to demonstrate the pattern for how you add to those 30 append buffers?
  2. On the other hand, D3D12 brings with it Shader Model 6 and the new 'Wave Intrinsics' which allow for wave-level reductions in the number of atomic operations (32x less on NVIDIA, 64x less on AMD). By balloting the wave on the number of threads that wish to increment a value you can have a single thread perform a single InterlockedAdd on behalf of all threads. In my experience so far (on AMD hardware) this 64-fold reduction in the number of atomic operations to main memory more than makes up for the fact that counters can no longer live in GDS, so we're in a better place than we were on D3D11!
  3. in D3D11 it was possible for Append Buffers to have their counter stored in special memory (e.g. GDS) if the hardware had it. Performing atomic memory operations on GDS is much faster than having the count stored in main memory and be atomically incremented/decremented there. In D3D12 where it's all "just memory" and the counter is a separate resource in main memory this optimisation no longer applies. If AppendBuffers weren't codified as being in some way 'special', I can imagine it would have been more difficult to allow the IHVs to handle the counters in whatever magic ways they might have wanted.
  4. Have you had a look at DirectXTech? https://github.com/Microsoft/DirectXTex texconv sounds like it does everything that you need it to. Note: I'm not interpreting the license for you and the rights it may / may not give you to distribute it with your application.
  5. I believe (and @SoldierOfLight can correct me on this if I'm wrong) that the current way the memory manager works is that it requires contiguous physical memory for each committed resource. Obviously it's not always going to be as easy to find 4GB of contiguous physical memory even on a card with 8-12GB of VRAM, so once again the multiple-heap reserved resource approach will work around the problem.
  6. ajmiles

    HDR hardware on PC

    Have you looked at the D3D12 HDR sample? https://github.com/Microsoft/DirectX-Graphics-Samples/tree/master/Samples/Desktop/D3D12HDR
  7. Why do you need a group barrier at all? Can't you just use the optional 'originalValue' argument of InterlockedMin rather than re-reading the value you just InterlockedMin'ed into?
  8. XNAMath now lives under the name "DirectXMath", also in the Windows SDK. It should just be a drop-in replacement by-and-large.
  9. Do you really want to still be using the DXSDK from June 2010? The Windows SDK (you have 16299.0) already has all the DirectX headers and libraries in it, so why not use that instead?
  10. float4x2 and float2x4 are every bit as much a 'matrix' as float4x4 for the purposes of packing. /Zpr (Row Major Packing) will affect float2x4/float4x2 and will cause them to take 4096 bytes instead of 2048 and vice versa depending on whether that flag is set. This shader, when compiled with /Zpr is a 2048 byte constant buffer and reads float4's: cbuffer B { float2x4 stuff[64]; } float4 main(uint i : I) : SV_TARGET { return stuff[i][0] + stuff[i][1]; }
  11. float2x4 stuff[64]; - Is not 2048 bytes, it's 4096 bytes as each 'register' in a constant buffer is padded to float4. No such padding will occur with a StructuredBuffer, so perhaps you're copying a 2048 byte structured buffer into the first half of a constant buffer that the compiler is expecting to be 4096? You probably wanted float4x2 stuff[64] instead? Can you show me your cbuffer layout so we can be sure that that's the problem? I expect either you've only got half the data in the right place or it has been transposed between float2x4 and float4x2.
  12. You can't measure GPU time using QueryPerformanceCounter. All you've done is measure how long it takes to issue the API calls, no?
  13. ajmiles

    MSAA in DX12?

    ResolveSubresource and ResolveSubresouceRegion (new to DX12) still exist if you don't want to do your MSAA resolve manually. If your resolve operation is just an average of the N samples then using the Resolve API will be at least as fast as doing it yourself.
  14. ajmiles

    DirectXMath conditional assignment

    XMVectorSelect is what you're looking for. It takes the masks output by functions such as LessOrEqual and each bit in the mask is used to select between A or B.
  15. ajmiles

    Back buffer DXGI_FORMAT

    Don't confuse a 10-bit SRGB/Rec709 output with 10-bit HDR/Rec2020. The fact that you're now using a 10-bit swap chain and potentially getting a 10-bit output is a completely separate and unrelated matter from whether you're using HDR or Rec.2020.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!