Jump to content
  • Advertisement

galop1n

Member
  • Content count

    305
  • Joined

  • Last visited

Community Reputation

1044 Excellent

4 Followers

About galop1n

  • Rank
    Member

Personal Information

  • Industry Role
    DevOps
    Programmer
  • Interests
    Programming

Social

  • Twitter
    galop1n
  • Github
    galop1n
  • Steam
    galop1n

Recent Profile Visitors

6883 profile views
  1. The biggest advantage is the use of the waitable object. You can now do something meaningful while waiting for the vsync ( because now, the wait is on you ). And it is even better, because the wait is not on vsync, but on the driver in need of a new frame, with a fine grain control over how many frames can be queued, putting the wait upfront reduce latency because you prepare a frame closer to the moment it will display !
  2. That is the perfect example of a compute shader. Why would UAV write be more costly if coherent, it does not make sense Texture2D srcImg[6]; RWTexture3D dst; [numthreads(8,8,1)] void main ( uint2 dtID : SV_DispatchThreadID ) { float4 srcs[6]; for(uint i = 0; i!=6;++i) srcs[i] = srcImg[i][dtID.xy]; for(uint slice=0;slice!=numSlice;++slice) { float4 result = /* do something of my input + slice index */ dst[ uint3(stID.xy,slice)] = result; } } That is a core example, you may not want to run all the slice in a single group because you may have some stall and not have enough thread group to hide latency. But man, simpler is better at first, then optimize if it is a problem ! And even if you read the 50 times, 6 photos, let say 2048 square ( hopefully in BC7 format to save bandwidth ), it is 1GB of memory read. A GTX 770 has 224GB/s of bandwidth, it means the read account for less than 5ms. Yes, it is not the end of the word if your app is merely doing that !
  3. He try to implement a mesh voxelization or fluid sim, or i am not a graphic engineer Sadly for you, geometry expansion shader are usually a pain in the ass of performance and are usually not the best pick for anything. I would second Hodgman here, a little more details would help to design an efficient solution. Why did you try to create a texture 3D if only a texture 2D array is enough ? On some hardware, the RTIndex can be output by the vertex shader, and the bandwidth of reading a mesh is unlikely to matters when filling that much texture !
  4. Out of memory is a different error code, unless you were asking a size above the DirectX limits. With D3D12, you should run most of the time with the debug layer and solve any issue as soon as possible, it is life saving ! It is as simple as installing the graphic tools ( start menu > optional feature > graphic tools ), then in your application, at the begining, call D3D12GetDebugInterface and enable the layer from the returned interface. If you never did it, even if you are at the hello world triangle stage, chance you have dozens of errors and warning are pretty high. And consider that the layer is not a magical tool and many errors are not catch either !
  5. I mean using D3D12GetDebugInterface and call EnableDebugLayer on the interface. There is 99.9% of chance that the log will display a real reason for the fail.
  6. What about with the debug layer ? Any output hinting the why ?
  7. Because SampleLevel force the mip, and computing level or full gradient ( for aniso ) is not trivial, while normally, the gpu can just the neighbors values to figure it all for free. Right now, you just you give 0 to the call that disable mips even if they were here
  8. I don't think so, are you not using mipmaps to do VSM pre filtering ? If you do so, SampleLevel loose that. My VSM memories are fading, but i am pretty sure they involve mips
  9. If the compiler can't determine if pointLightindex start and end are uniform. It means that some pixels may loop more or less. And then you get to your sampling instruction with neighbor pixels threads masked. To sample, the GPU use a 2x2 quad to compute derivative of your uvs, but in your case some values won't be here, it is undefined behavior. You can use SampleLevel or SampleGrad to solve that. Or just know that you will have the proper conditions and ignore the warning
  10. galop1n

    Pixel shader UAVs in ps_5_1

    Haha, i recognize that one ! 10586 was the latest sdk at the time and i filled a bug to Microsoft. At the time, they gave me a preview build of 16xxx that was already fixing it
  11. The handle is an abstraction, so even if the memory was moved somehow by the driver, the handle could have remain the same. But by design, that call will not change result for the lifetime of your object, even if you evict it. Because dx12 relies on virtual addressing.
  12. galop1n

    DXGI_FORMAT_D16_UNORM to RGB ?

    You probably won't see anything with how depth are stored, the usable range of depth is probably between 0.98 and 1 or something similar. Also, you don't have to go up to copying and mapping on the CPU, you can create a R16_UNORM view and draw the depth buffer on screen with a pixel shader, way faster if you don't need to store the result for later usage.
  13. galop1n

    Shader compile step

    You could in theory duplicate the MSBuild files for the fxc compiler and derive from that the dxc version. Then you would pick the file type in your project to get the compiler you desire. But usually, only small samples and quick and dirty test use hlsl compilation from the visual studio project. In a real context, you usually deal with calling the compiler in your own tool. There is many reason for that, here are some : * You want to strip the debug information and store a separate pdb in a dedicated folder * Cache compilation result to skip unneccessary work * Pre-process the reflection data to compute the binding at runtime * Spread compiles over the network * Generate multiple version from a single file ( like with debug features / without debug features / with or without tesselation / etc ) * Compression ? * Hot loading and compiling shader at runtimes ( for development purposes only )
  14. galop1n

    WARP vs UAVs in PS

    What is your uav start slot when calling OMSetRenderTargetsAndUnorderedAccessViews ? If you use u1 as a register, it is likely to be 1, but if you have no RT bound, i could imagine you set it to 0 by mistake, creating a missmatch. You lost a single day on an issue and want to cry, good luck when you will get a monthly long unresolved bug
  15. galop1n

    Need advice on Graphics Programming

    GL vs DX has no better nor worse choice, it is all about advantages/cons ( doc / tools / platform / knowledge / drivers / features ) of each to balance with your needs. The only piece of advice, if you are not a triple A studio, do not use dx12, it is not for you, and use dx11. The later is not going to die, and the former is only when you need a breakthought in large application and you can pay the price of 10 times more complexity in development and pain to afford it for the gain it can theorically provide.
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!