Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

3405 Excellent

About ajmiles

  • Rank

Personal Information

  • Role
  • Interests


  • Twitter
  • Steam

Recent Profile Visitors

7312 profile views
  1. ajmiles

    CS slow on the driver side

    That doesn't sound right, but then again I've never measured it. I assume all your profiling is being done in an optimised build without the debug layer enabled?
  2. Are you content with the fact that a D3D12 port would only be able to run on Windows 10?
  3. ajmiles

    Buffers and SubAllocation

    Yeah, it's all driver managed on D3D11 and done for you transparently. The ethos of D3D12 is to move expensive logic out of the driver and into your title where you know what your title's particular patterns are.
  4. Placed Resources allow you to alias multiple resources over the top of another if saving memory is your goal. They also facilitate the approach of allocating a much larger amount of memory in a single go (a Heap) and then 'placing' multiple resources inside the same memory allocation. If a shadow map and a post-processing render target are not needed simultaneously in the frame then there's no need for them each to have their own memory. As another example, you might choose to place an entire model's resources into a single heap. That might mean vertex buffers, index buffers and the model's textures have their sizes summed and a single heap is allocated of that size. Each individual resource is that 'placed' at the relevant offsets in that heap.
  5. ajmiles

    Buffers and SubAllocation

    The GPU doesn't care particularly whether a series of buffers form one allocation or not. The main reason is that memory is allocated at the granularity of a 64KB page. That is also true of the granularity at which heaps are Evicted and made Resident - pages, not individual allocations. Of course, generally you want buffers much smaller than 64KB, in which case you'll be wasting a lot of memory per allocation unless you use it for multiple things. Also, since allocating memory is expensive, it's better to do this as infrequently as possible and therefore grabbing larger amounts of memory and sub-allocating it yourself is going to be faster than asking Windows and IHV's driver to do this thousands of times for very small amounts.
  6. ajmiles

    wavefronts in-flight

    Yeah, on Sea Islands at least there's still a separate K$ and L1 (which for obvious reasons is the version of GCN I'm most familiar with), but I haven't paid too much attention to Polaris and beyond. LDS peak bandwidth is 128 bytes per clock per CU, so isn't enough to service all 4 SIMDs issuing 16 threads worth of 4-byte LDS reads every clock (that'd be 256 bytes per clock). Even so, that's 6TB/s on an Xbox One X! The GCN instruction set does have 1 and 2 byte read instructions, so in theory it could service 64 threads per clock doing those, but I've never tested the latency of it.
  7. ajmiles

    wavefronts in-flight

    I once wrote a shader to try and measure this and got numbers of around: K$ (Constant Cache): ~16 cycles L1 Hit: ~116 cycles L2 Hit: ~170 cycles
  8. Bit depths that are not a power of two (96-bit) are not particularly fun to support.
  9. Perhaps there's some quirk around zero descriptor heaps that we haven't tested. I'll try and find some time next week to see if I can reproduce any problems. Can you confirm what version of Windows you're using right now? Run 'winver' and type out the full build number.
  10. I'm obviously not familiar with all your code, but this looks a bit odd, no? UINT heap_sizes[] = { 250, 0, 20, 0 }; Each of those heap sizes seems to correspond to a descriptor type, and you're asking for zero Sampler descriptors and zero DSV descriptors in the heap?
  11. Are you able to provide a repro of some description?
  12. Have you tried this code on more than one IHV's GPU and do you have the driver fully up to date? What about WARP / Microsoft Basic Render Device?
  13. @AlanGameDev Ah ok, in that case the Optional Feature I was talking about won't apply, it's just a Windows 10 thing. I'm not sure what the 8.1 SDK tools are, but they're probably not relevant. I'll have a look at the compiled code for the shader you posted in full earlier, but I don't expect it to yield anything obvious. EDIT: Can't see anything amiss with the shader as posted. I think a runnable repro is probably the only way to go.
  14. @AlanGameDev I just ran that device creation code just fine here so I'm not sure why you'd get DXGI_UNSUPPORTED. Do you have Windows 10's Optional Feature "Graphics Tools" installed?
  15. @AlanGameDev Ah ok, I misunderstood that bit about DXUT, it's clearer now I've reread it. What's the smallest value of max/w that causes the problem? Wondering if we could perhaps take a look at the shader disassembly between the closest working and not-working versions of the shader. Also, have you tried running the workload on WARP / Software Render Device? If the shader has been compiled in such a way that it'll never terminate then I would expect WARP to hang in the same way as a hardware device - this would rule out a vendor-specific issue.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!