• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

1034 Excellent


About galop1n

  • Rank

Personal Information

  • Industry Role
  • Interests


  • Twitter
  • Github
  • Steam

Recent Profile Visitors

6170 profile views
  1. DX12 Shader compile step

    You could in theory duplicate the MSBuild files for the fxc compiler and derive from that the dxc version. Then you would pick the file type in your project to get the compiler you desire. But usually, only small samples and quick and dirty test use hlsl compilation from the visual studio project. In a real context, you usually deal with calling the compiler in your own tool. There is many reason for that, here are some : * You want to strip the debug information and store a separate pdb in a dedicated folder * Cache compilation result to skip unneccessary work * Pre-process the reflection data to compute the binding at runtime * Spread compiles over the network * Generate multiple version from a single file ( like with debug features / without debug features / with or without tesselation / etc ) * Compression ? * Hot loading and compiling shader at runtimes ( for development purposes only )
  2. DX11 WARP vs UAVs in PS

    What is your uav start slot when calling OMSetRenderTargetsAndUnorderedAccessViews ? If you use u1 as a register, it is likely to be 1, but if you have no RT bound, i could imagine you set it to 0 by mistake, creating a missmatch. You lost a single day on an issue and want to cry, good luck when you will get a monthly long unresolved bug
  3. DX12 Need advice on Graphics Programming

    GL vs DX has no better nor worse choice, it is all about advantages/cons ( doc / tools / platform / knowledge / drivers / features ) of each to balance with your needs. The only piece of advice, if you are not a triple A studio, do not use dx12, it is not for you, and use dx11. The later is not going to die, and the former is only when you need a breakthought in large application and you can pay the price of 10 times more complexity in development and pain to afford it for the gain it can theorically provide.
  4. DX11 Asynchronous Texture Creation DX11

    When you create the texture, do you provide initial data, or you keep that later with some update sub resrouces ? Without initialData, usually the driver does not dare to reserve and commit memory. And you hitch at first use, with initial data, even if it takes a long time because of tiling and allocation, the driver does not have the choice but to do it, and so be hitch free at first use.
  5. DX11 Using Shared textures DX11

    There is support for real cross adapter sharing without involving the cpu with D3D12 and D3D12_HEAP_FLAG_SHARED_CROSS_ADAPTER, but it is probably too much of a trouble to go that road. https://msdn.microsoft.com/en-us/library/windows/desktop/mt186623(v=vs.85).aspx
  6. I was just commenting on Rage/Doom comment. That is Mega Texturing ( one huge mega texture and the world geometry is uniquely mapped to it without any reuse ), that by way of consequence, use virtual texturing ( either with handcraft indirection texture and mip selection or with hardware support of PRT ) to achieve that.
  7. This is a completely different tech. Virtual texturing an unique baked giant textured world. They run with dynamic atlasing indirection texture and usage feedback from the GPU drawing. I personall hate vritual texturing done that way. It hurts texel density to fit on disk. It hurts texture quality by doing some jpeg to BCn plus ugly "bicubic" upsample to try recreate data that does not exist. Without saying that deal a lot with tile border and filtering bugs and caveats. I don't think you will see more than 4x aniso in doom or rage for example. More recent hardware can help a little to ease implementation, with virtual memory, but it is still a huge piece of crap that create more problem than it solved.
  8. Your artists should not have to deal with any texture grouping, even if the only tool is maya or 3dsmax. Just let them use one material per texture. And when you pre-process the mesh. Merge everything and just add a texture index. At runtime, a texture array is a single texture also, so you are not dealing with N textures either. Or if you go texure atlas, pack the individual texture automatically, do not ask your artists to do it !
  9. Merging geometry and materials are two different things. And most materials for shadow map do not need the texture in the first place ( unless it does alpha clip ). It would require a few more info on what is your overall frame like to provide good pointers. We could write books on what is good and what is bad practice in a renderer, so If you are ok to share a renderdoc or nsight capture, i can give you some advice on what i will see concretely.
  10. If the only things that change between your draws are a few texture binding, you can push thousands of draw calls before you start to fill a cost here ! Do not forget, early optimisation is root of all evils. Unless you have decades of experirence at profiling GPUs and Graphics API, It is dangerous to jump into thinking something is fast or slow. Make your render visually correct first, then optimize based on real profiling if it is needed only.
  11. What what ? Why would you need to cut draw calls ? You can store the slice index in the mesh stream, and you are done. Texture array have nice wrapping properties and you can pick the slice per pixel but restrict over format and size. Atlas have ugly needs for wrapping and mip filtering and also have format restriction. Ideally, bindless dx12 is the best to access many textures in a single draw call. How many draw calls are you trying to save anyway ? You can just store per vertex the slice index, take that advantage by packing your UV in FLOAT16 instead of FLOAT32
  12. I all you want is pack together 1024 textures to tile. Just use a texture array…
  13. The short answer is no. The longer answer is also no. Nothing is ever simple with shadows in real time rendering. The problem is that a shadow-map texel will cover a variable amount of screen pixels. The well seen result without any counter measure is aliasing as the surface on screen goes up and down the unique shadow map texel. The problem get worse with shadow map bluring like multi tap PCF because you end fetching further, increasing odds of false positive and negatives. Depth bias of any sort, as at shadow map render or as shadow test read are a useful tool but are never perfect. The idea is to push the whole scissor like stairs depth of the shadow map texels under the screen surface. By definition, they generate your peter panning effect. You can have lighting artist spending their days tweaking the biasing for every light and still not have a perfect result. The slope based bias is making your bias bigger where the acne is the most likely ( perpendicular surface that will extend over way more on screen projection ), and by so, will add even more floating If you look at recent attempt to solve that, there is a "crazy" technique doing real raytrace up close that then blend to shadow map. The demo came from nVidia and have been implemented at least once (afaik) in Battlefield 1 for their Sun.
  14. This is indeed a hardware limitation. Texture descriptor are scalar register on GCN. If your index is not uniform across the lanes. It creates the artefacts you see. What the NonUniform intrinsic does is to generate a loop, masking threads per index and do the texture fetch until all lanes have done it. It does not do it at the DXC/DXIL bytecode but just tag thing in case the driver has to do it. If you have access to ballot and readfirstlane in glsl, it is easy to write yourself two version of the shader for GCN and nVidia, doing it or not depending on the GPU.
  15. You can also make the wait explicit ( and put it at the front instead of at the end ) by using DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT. This flag let you wait on a swap chain buffer to be ready to be filled for next frame. Advantages are : * You can use the cpu for other task while waiting instead of having a dead thread stuck in Present * It reduce the latency ( mostly because you do not aggressively queue frames, but push them instead when they are needed, closer to the actual moment they gonna be display ) * You control the number of allowed queued frame explicitly and so the exact behavior ( depends if your cpu update+render + gpu frame fit in one vsync or two basically ).
  • Advertisement