SoldierOfLight

Members
  • Content count

    196
  • Joined

  • Last visited

Community Reputation

2140 Excellent

About SoldierOfLight

  • Rank
    Member

Personal Information

  • Location
    Redmond, WA
  1. So to get this straight, you're copying from an unaligned mip to a larger mip? I misinterpreted the question, and assumed you were going from a mip in one texture to a same-sized mip of a different texture. According to the D3D spec, unaligned textures still have all 4 texels present in the physical data, but as far as I can tell, it doesn't specify what their contents have to be. What you're seeing here appears to be copying the full 4x4 region into the larger texture and seeing the results. I'm not sure what you'd expect to happen in this case - are you expecting black around the border?
  2. @mhagain is correct, that's the easiest way to get what you want - the discrete AMD GPU enumerated as adapter 0 and appearing to be connected to the laptop output. @iedoc isn't entirely incorrect. On these hybrid laptops, the OS will essentially do what he suggested under the covers, using an efficient cross-adapter copy mechanism with only GPU-side synchronization, no CPU synchronization. Also, you don't typically have to worry about the WARP adapter being enumerated first, it'll almost always be last. The general sorting order that DXGI uses is: The adapter with the primary output. Other hardware adapters with outputs. Software adapters with outputs (can happen in some circumstances, but not a common scenario). Hardware adapters without outputs. Software adapters without outputs.
  3. You should be able to provide a null source box, which is equivalent to applying the whole source subresource to the specified offset in the dest (which should be (0, 0, 0) if you're copying the whole mip). In general, you should always round up the size of the copy to block dimensions, even if the texture doesn't have the logical dimensions to support it. So a 1x1 BC3 mip would be copied as if it was 4x4.
  4. DX11

    As far as counters go, they're all for IHV-specific counters. In D3D10 there were API-defined counters, but they were deprecated in D3D11. The current model for performance counters is the plugin model exposed by PIX. Also I just checked, and apparently I was wrong about SetStablePowerState, we did keep it around, we just moved it from requiring the D3D12 debug layers, to requiring developer mode. My bad.
  5. DX11

    That's a great idea in theory, except that we've deprecated this API in recent Windows 10 releases (I don't recall exactly when), so you'll need to be on a slightly older build. What we found is that given your example of a base of 1500 and a boost of 1650, the GPU is able to maintain that boosted clock rate nearly indefinitely. So using SetStablePowerState produces a completely artificial scenario that doesn't mimic what would happen on real world machines, making it relatively useless for profiling.
  6. According to https://msdn.microsoft.com/en-us/library/windows/desktop/bb174549(v=vs.85).aspx:
  7. That probably means something along the lines of having multiple command lists, each referencing different resources, and re-using them as it becomes safe to overwrite the contents of those resources.
  8. As long as the SRV and UAV correctly target different mip levels you should be able to do it, yes.
  9. I didn't mean D3D11_CREATE_DEVICE_DEBUG (though that can have a bit of impact), I meant the visual studio option for debug or release.
  10. Your problem is much more likely something else, like compiling a debug build or synchronizing the CPU and GPU (e.g. MAP_WRITE/MAP_READ instead of MAP_WRITE_DISCARD) if we're talking about such a serious perf problem.
  11. There is no D3D12-based video hardware acceleration (yet). Your only options are DXVA which are all D3D9-based, or the D3D11 video APIs (ID3D11VideoDevice and family). Alternatively, you can do software decoding of the video and simply upload the result to GPU memory like Bink.
  12. If your swapchain type is DXGI_SWAP_EFFECT_FLIP_SEQUNTIAL or FLIP_DISCARD, you might be missing a call to OMSetRenderTargets. These swapchain types will unbind the back buffer after calling Present. I should also point out that you should run with the D3D11 debug layer, as it would almost definitely complain that you're calling Draw() without having a render target bound (assuming that's the problem).
  13. GPUs aren't necessarily as stateful as your mental model might indicate. Each command list command isn't necessarily something that's executed by the GPU. It might simply be storing some CPU-side state which will be used by the driver when recording a subsequent command. In that case, since command lists can be recorded in parallel and submitted out-of-order, these types of commands need to be present in every command list. Rather than try to pin hardware to a particular model where some states must be set by hardware commands, and others may be CPU-side tracking, D3D12 requires all command list state to fit into a model that supports either.
  14. Try the D3D11_CREATE_DEVICE_DEBUG flag. Then you should get a debug message telling you what you did wrong.
  15. Using IDXGIFactory, you can EnumAdapters to find those 3 adapters, yes. Then if you create a D3D12 device on each of them, you can discover that some of them might be made up of multiple nodes. See ID3D12Device::GetNodeCount. Each node can theoretically have a different architecture. You can target certain API objects at multiple nodes (e.g. a resource or heap can be visible to multiple nodes through its VisibleNodeMask), while other API objects must belong to a single done (e.g. a command list takes a node mask that must only have a single node bit set). However to use a resource on a device created on a different IDXGIAdapter, it must be created as SHARED_CROSS_ADAPTER. Essentially, nodes are a lightweight way of using multiple GPUs that belong to the same IHV and tend to have similar/identical properties. Unless you're running on an SLI/CrossFire configuration, GetNodeCount() will probably return 1. For node indices, the API will only accept 0. For node masks, the API will accept 0 (i.e. default) or 1 (1 << 0, indicating node 0). If there was a second node, then GetNodeCount() would return 2, the API would accept 0 or 1 for node indices, and for node masks you could specify 0 or 1 to target just the first GPU, 2 to target the second, and 3 to target both (if applicable). Unless you're attempting to take advantage of SLI/CrossFire, you can ignore the node indices/mask and just use 0.