Jump to content
  • Advertisement

Adam Miles

Member
  • Content Count

    480
  • Joined

  • Last visited

  • Days Won

    4

Everything posted by Adam Miles

  1. Adam Miles

    Strange Assembly Error

    Pretty sure this is nothing to do with Graphics / GPU Programming. Try https://www.gamedev.net/forums/forum/9-general-and-gameplay-programming/
  2. Adam Miles

    DX11 Drawcall isn't working out

    Problem #1: Your Vertex Shader calculates the y-coordinate for the two vertices as: 1) 1 + (0 / 300) = 1.0f (the top of the screen) 2) 1 + (200 / 300) = 1.666f (off the top of the screen) The y-coordinate in clip space is -1 at the bottom and 1 at the top, you probably want "1 - (pos / halfSize.y)" instead. Problem #2: You clear your render target to magenta and your lines are magenta coloured. Fix both problems and the line appears:
  3. Adam Miles

    DX11 Drawcall isn't working out

    When calling IASetVertexBuffers, you set the stride of your vertex to be 0 bytes. That means every vertex rendered will advance 0 bytes through the vertex buffer and therefore have identical position/colour/UV as every other vertex.
  4. Adam Miles

    DX11 Drawcall isn't working out

    Easiest way to figure it out would be to take a capture with RenderDoc, upload it and share that.
  5. What you're asking for is nigh on impossible. You may get closer by enabling IEEE strictness on your shaders and disabling any fast-math optimisations on your C++ compiler, but only certain floating point operations are guaranteed to give the same result across architectures (e.g. +, -, *, /). You can try enabling IEEE strictness using the /Gis compiler flag, but I don't think it'll get anything other than "close, but not quite". Try giving this blog post from Bruce Dawson a read: https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
  6. You can't have a DepthStencilView with an ArraySize of 6 starting at any slice other than 0 since you only have 6 slices in your array to begin with. I assume you meant to set ArraySize to 1 in dsvDesc.
  7. Adam Miles

    DX12 DX12 Debug Interface memory leak?

    To quote my colleague @SoldierOfLight from DX12 Discord, "looks like the fix for that should be in the latest insider fast build, 18963"
  8. Adam Miles

    DX12 DXR Flickering Artifacts

    Your problem is this: And the fix is this: Since you currently only have one hit group, the last two geometries are indexing off the end of your shader binding table into unknown memory. Therefore it's not running your Closest Hit shader for the last two triangles and therefore the payload is coming back with the same color it was initialised with in the ray generation shader (Narrator: It wasn't initialised, hence the flickering).
  9. Adam Miles

    DX12 DXR Flickering Artifacts

    The artifacts themselves look more like an issue stemming from a missing barrier or synchronisation on the UAV you write to or how/when you copy it to the swap chain (rather than the TLAS/BLAS builds). If you want to check something into your branch again I can take a look in the morning.
  10. Adam Miles

    DX12 DXR and Device Hung Error

    I checked your latest changes and the one thing you didn't do was make the TLAS a 'Root' Shader Resource View. I'm just about to jump on a plane, so I've attached a modified RayTracingPass.cpp you can diff with yours and see the 4 changes I made. It should be possible to have a TLAS be inside a Descriptor Table, but I would need to debug it further when I get home to figure out why a RootSRV works but an SRV in a Descriptor Table doesn't. RayTracingPass.cpp
  11. Adam Miles

    DX12 DXR and Device Hung Error

    @_void_ Using the project you linked in a PM I can see the GPU hang you report. When trying to view the Acceleration Structure in PIX it was clear you'd created the TLAS SRV as a StructuredBuffer which is not what you're supposed to do. This StructuredBuffer was created with NumElements = 1, StructureByteStride = which isn't going to work. According to the DXR spec you're supposed to create TLAS SRVs as 'Raw' SRVs. I've never actually tried creating a Raw SRV of a TLAS and then putting it inside the shader binding table / ray gen shader's local root signature. When I tried it, the hang continued... As a 'fix' (and this is what I do normally anyway), I put the TLAS as a Root SRV in the Global Root Signature instead and this works.
  12. Adam Miles

    DX12 DXR and Device Hung Error

    You're going to need to simplify what you've got as much as possible (or post something someone can run). Try calling TraceRay with the "SKIP_CLOSEST_HIT_SHADER" flag to eliminate that as a cause of the hang. Try setting the miss shader Shader Identifier to null so it never gets executed. Remove all code from Closest Hit / Miss Shader etc.
  13. Adam Miles

    DX12 DXR and Device Hung Error

    The list of ways to hang the GPU when using DXR is as long as my arm. Have you considered: Setting the recursion level too low when creating your RTPSO? Setting the payload size too low when creating your RTPSO? Setting the attribute size too low when creating your RTPSO? Forgetting to bind the Top Level Acceleration Structure? Forgetting to initialise the shader identifier for your hit group? Forgetting to initialise the shader identifier for your miss shader? Using unbound resources in the global root signature? Using unbound resources in the local root signature? Using a TLAS or BLAS before it has completed building?
  14. This doesn't have anything to do with Graphics or GPU programming.
  15. Adam Miles

    DX11 Wrong HLSL buffer width

    As your code is currently written in Github you aren't explicitly declaring which 'register' each constant buffer should be assigned (you were in the original code snippet you posted). If you declare two constant buffers and elect not to use one of them and *haven't* explicitly declared which register slot each one belongs to then the compiler will assign the *used* constant buffers an index in ascending order as they appear in the code. You have MatrixBuffer which is unused in the VS and ScreenSizeBuffer which is used. Therefore ScreenSizeBuffer is going to be VS Constant Buffer 0 and there will be no VS Constant Buffer 1. The corresponding C++ code is setting the ScreenSizeBuffer to be VS cb1 not cb0, so I don't see how it can be working in its current state. unsigned int bufferNumber{ 1 }; deviceContext->VSSetConstantBuffers(bufferNumber, 1, &m_screenSizeBuffer); Is what we're seeing in Github the 'working' code or the 'broken' code?
  16. Adam Miles

    DX11 Wrong HLSL buffer width

    What you're describing sounds like a bug in your code that we don't have enough information on to help you try and fix. How did you fix it?
  17. Adam Miles

    DX11 Render Target Picking

    Since the OP probably only cares about the data that's under the mouse cursor it would make a lot more sense to optimise the process around that knowledge. If I were implementing GPU-picking with a CPU read-back I would: Provide the GPU with the mouse coordinate XY via a constant buffer. Bind an Append Buffer UAV to the Pixel Shader stage with enough space for holding as many 'picks' under the mouse cursor as you might ever expect to have due to overdraw. Something in the region of 32 sounds reasonable. In the Pixel Shader, check if SV_POSITION.xy == MouseXY and then simply append a pair of values (Depth and Object ID) to the list. Be sure to add [earlydepthstencil] to the shader so you don't lose the EarlyZ depth testing optimisation. CopyResource the ~32 values to the Staging resource. Map that small buffer instead and find the one with the object ID with the lowest depth value. Optionally sort them by depth and you have yourself a sorted list of hits under the cursor. This approach has a number of advantages: Memory - you don't need an entire extra render target to hold object IDs for all the pixels you don't care about. Speed (probably, profile it) - you're not writing out this entire extra render target's worth of data and you're only copying back about ~256 bytes of data over the slow PCI-E bus as opposed to 8-32MB. Flexibility - if at a later date you decide you want to visual picks behind the front-most object you have a full list of all picks on all objects that lie under the mouse cursor rather than just the top-most one. Using your method, you might also want to think about using CopySubresourceRegion to copy back just the one texel you're interested in rather than all of the texels. PCI-E 3.0 at 16x has a peak bandwidth of 15.75GB/s. If you're copying back a 2160p R32 surface (32MB) that's going to take at least 2ms for the GPU to complete - so the less data you can transfer back the better.
  18. Adam Miles

    3D GPU Particle Collision System

    It read like a summary statement of the entire algorithm rather than just the last part of the process, but I can see that it could be interpreted both ways. To @DiligentDev is there any reason you don't use SV_DispatchThreadID and instead choose to calculate it manually?
  19. Adam Miles

    DX11 Wrong HLSL buffer width

    There's no need to start packing any buffer types up to the next power of two. Some of HLSL's packing rules could certainly be described as 'arcane', but nothing ever causes a buffer to get aligned beyond 16 bytes in size. If NSight thinks the constant buffer called "ScreenSizeBuffer" is 192 bytes then chances are it has a bug. I expect if you analyse the DXBC emitted by FXC then you'll find the metadata at the top of the compiler shader telling you that cb0 is 192 bytes and cb1 is 16 bytes.
  20. Adam Miles

    3D GPU Particle Collision System

    O(1) algorithmic complexity would mean the execution time is invariant to the number of particles - that would be a groundbreaking discovery!
  21. Adam Miles

    DX11 Problems with Compute Shaders

    Have you tried using RenderDoc to debug what's going on?
  22. Adam Miles

    DX11 Problems with Compute Shaders

    I assume you've run with the Debug Layer enabled to check it's clean and error free? I also assume that the comment "//called by dispatch(1,1024,1)" is wrong?
  23. Adam Miles

    DX11 Texture caching for Texture2DArray

    Partially Resident Textures / Tiled Resources would be one way you could achieve the memory saving you're looking for. If you want to reuse the same memory for one slice by having it shared in multiple arrays. There are some restrictions around array textures with mips though that could scupper that plan as the spec wasn't resolved until D3D12 I think. But it sounds like your main concern is the rebinding of textures. Do you have so many unique textures that you can't just store them all in one big Texture2DArray that is shared by all chunks in all tiles? It sounds like you contemplated that approach but discounted it because it wouldn't work on D3D11. Dynamically indexing into slices of a Texture2DArray on D3D11 is fine - that's different to dynamic indexing introduced in D3D12 which would let you sample from completely unrelated textures. If all your textures are the same resolution and format they could all go into one big Texture2DArray. It might mean you want one Texture2DArray for Albedo, one for Normals, one for Specular etc, but so long as all formats and sizes within each "family" are the same then that would work in D3D11.
  24. I wouldn't completely write-off your approach, you are making the situation artificially worse right now by stalling on the results from the query immediately after the work is sent to the GPU. The article you linked to makes it clear that you should be double-buffering (at least) these queries and not stalling the CPU at any time waiting for the results - if you do that then I think you will see things improve. I would also expect GPU timestamp queries wrapped around individual draw calls (or small sequences of draw calls) to be more accurate. Trying to measure the time an entire frame takes (including Present) without accidentally measuring the GPU idle time that comes at the end of each frame might prove tricky. If used carefully, you might also find that inserting a call to ID3D11DeviceContext::Flush after the last query in your frame (after Present) will kick pending commands off immediately rather than those commands ending in the next kickoff later in the next frame (potentially after the GPU has already been idling for a while). If you can get to a point where the GPU is 100% busy and you're double or triple buffering your queries you should get back much more reliable information. That said, there is no substitute (in my opinion) for using a GPU profiler like NSight, Radeon GPU Profiler, Intel GPA or PIX. These tools will do a much better job at profiling your GPU workloads than timestamp queries.
  25. I think this is probably an artifact of when D3D11 decides to kick off / submit the command buffer that you're building behind the scenes. In D3D11 you essentially have no control over when a command buffer gets submitted - it could literally be on any GPU command for all you know. Taking your code from above, imagine the sequence of operations was: m_d3DeviceContext->Begin(disjoint0); m_d3DeviceContext->End(queryStart); Sleep(10); //First sleep m_swapChain->Present(0, 0); // Command buffer kick off here. GPU almost immediately does the work up until this point. m_d3DeviceContext->End(queryEnd); // This command is written but not executed yet... m_d3DeviceContext->End(disjoint0); // This command is written but not executed yet... Sleep(10); //Second sleep // CPU sleeps for 10ms while (m_d3DeviceContext->GetData(disjoint0, NULL, 0, 0) == S_FALSE); // GetData triggers a command buffer kickoff. From the GPU's perspective: 1) In the first command buffer it saw Begin(disjoint0), End(queryStart) and Present(0,0). 2) No more GPU work was issued for 10ms 3) The second command buffer contained End(queryEnd), End(disjoint0) and potentially any work associated with GetData. The time between the GPU consuming and executing the commands associated with queryStart and queryEnd was 10ms and that's why the second Sleep will affect the measured GPU time and the first one won't. You're at the mercy of when kickoffs occur and I know of no way to guarantee that two calls to 'End' a timestamp query will reside in the same command buffer segment and therefore accurately measure the amount of GPU time taken between the two queries. On D3D12 there's no such problem. When command lists get executed is entirely within title control and a command list is executed from beginning to end without the GPU taking a break. Is there a particular reason you want to measure GPU time using timestamp queries? Are you trying to profile your game?
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!