Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

528 Good


About turanszkij

  • Rank

Personal Information


  • Twitter
  • Github

Recent Profile Visitors

8552 profile views
  1. turanszkij

    Shadow map size recommendation?

    You will want to consider the quality/performance tradeoff for your specific project. 2048x2048 will look nicer, but will use more memory and could be slower to render and sample from than 1024x1024. Also keep in mind that too low resolution can hinder performance because of subpixel triangles. You will need to find the best resolution for the scene you are rendering, but 1024x1024 is usually a safe bet. And shadow mapping is not useless yet.
  2. turanszkij

    Biletaral Downsampler

    The reason you want to use bilateral instead of bilinear because bilinear doesn't account for depth discontinuities, just blurs everything, so you will get halos around a sharp character face against an out of focus background. Bilateral samples depth and color for every tap, and if a discontinuity is detected in depth (difference between center depth and sample depth are over a threshold), it falls back on the center color sample. It also doesn't just simply fall back, but lerp back instead by the difference weight, for example single direction gaussian bilateral blur: const float center_depth = texture_lineardepth.SampleLevel(sampler_point_clamp, uv, 0); const float4 center_color = texture_color.SampleLevel(sampler_linear_clamp, uv, 0); float4 color = 0; for (uint i = 0; i < 9; ++i) { const float2 uv2 = uv + direction * gaussianOffsets[i] * resolution_rcp; const float depth = texture_lineardepth.SampleLevel(sampler_point_clamp, uv2, 0); const float weight = saturate(abs(depth - center_depth) * camera_farplane * depth_threshold); color += lerp(texture_color.SampleLevel(sampler_linear_clamp, uv2, 0), center_color, weight) * gaussianWeightsNormalized[i]; } Keep in mind, that it is incorrect to separate a bilateral blur into horizontal and vertical direction passes, but in practice it might look acceptable. For example I use it for SSAO and it doesn't make a visual difference whether you separate it or not, but performance wise the separated version will be faster.
  3. From your image, what does RTV heap has to do with SRV heap? And it's not clear to me what you mean by heap cache. In my case, I solve this problem by having two different kinds of descriptor heaps: One heap where descriptors are created, deleted, etc, only CPU access A heap where descriptors are copied before draws. CPU and GPU access. When I create a Texture object for example, I create descriptors for it in heap1. When I delete the Texture object, I remove its descriptors from heap1. When I draw an object using a Texture, I copy the descriptor from heap1 to heap2. When I draw an other object with an other texture, I copy from heap1 to heap2, BUT: not overwriting the previously written descriptors, but keeping an offset to the last copied descriptor and copying to the free space. Before each draw that changed descriptors, I call SetGraphicsRootDescriptorTable(), with a GPU descriptor handle like this: D3D12_GPU_DESCRIPTOR_HANDLE binding_table = heap_start; binding_table.ptr += ringOffset; ringOffset is ever increasing with the amount of descriptor sizes that are copied, that you can get like: device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV); The ringOffset will be reset to zero when a new frame starts. You will also need to look out that if you double buffer your rendering, you will also need to double buffer your descriptor heaps, again, so that an other frame's rendering doesn't read from the heap you are currently writing to. I hope that helps, good luck!
  4. This is too little information to really say what's wrong. Keep in mind that CopyDescriptorsSimple performs an immediate copy of descriptors, but Draw() will only execute after you submit the command list. By the time the GPU executes the draw, you will want to keep descriptors of the draw alive and not overwrite them. For example, this can be done by using the descriptor heap as a ring buffer and always binding different portions of it via SetGraphicsRootDescriptorTable(). Do you update your descriptors with this in mind?
  5. turanszkij

    Structured Buffers and optimisation

    I would think that StructuredBuffer<float2> is not so bad. I would worry more if it was float3, because then loads could span multiple cache lines. Having float4 would be best if you access all the elements in one shader, because float4 can be loaded in one instruction on AMD I believe.
  6. turanszkij

    Structured Buffers and optimisation

    First, pack your data like MJP said, by access pattern, but keep in mind that loading from StructuredBuffer with a stride with multiples of float4 might be more optimal on some hardware: https://developer.nvidia.com/content/understanding-structured-buffer-performance I've had some experience when padding structured buffer to float4 did help performance on Nvidia.
  7. Yep, it's possible: groupshared uint Array[1024]; // Parameter is SV_GroupIndex void BitonicSort( in uint localIdxFlattened ) { uint numArray = ArrayLength; uint numArrayPowerOfTwo = 2 << firstbithigh(numArray - 1); for( uint nMergeSize = 2; nMergeSize <= numArrayPowerOfTwo; nMergeSize = nMergeSize * 2 ) { for( uint nMergeSubSize = nMergeSize >> 1; nMergeSubSize > 0; nMergeSubSize = nMergeSubSize >> 1 ) { uint tmp_index = localIdxFlattened; uint index_low = tmp_index & ( nMergeSubSize - 1 ); uint index_high = 2 * ( tmp_index - index_low ); uint index = index_high + index_low; uint nSwapElem = nMergeSubSize == nMergeSize >> 1 ? index_high + ( 2 * nMergeSubSize - 1 ) - index_low : index_high + nMergeSubSize + index_low; if( nSwapElem < numArray && index < numArray ) { if( Array[ index ] < Array[ nSwapElem ] ) { uint uTemp = Array[ index ]; Array[ index ] = Array[ nSwapElem ]; Array[ nSwapElem ] = uTemp; } } GroupMemoryBarrierWithGroupSync(); } } } I got this HLSL code from AMD's tile based particle rendering presentation: I also used it previously to sort a decal list in a screen space tile.
  8. turanszkij

    DX11 Render Target Picking

    Interesting, and what happens when you have 2 back buffers and maximum frame latency of 3? Doesn't in this case 2 the maximum number of frames that can be queued up? Would it make sense to take the minimum of [maximum frame latency and backbuffer count] to determine how long we need to wait for a resource to be finished on the GPU?
  9. turanszkij

    DX11 Render Target Picking

    Unfortunately I can't help with SlimDX, but if you defer reading from the resource by more frames than your backbuffer count, you can be sure that the GPU was finished with it.
  10. turanszkij

    DX11 Render Target Picking

    The problem is using MapMode.Read to try reading from the current frame's resource. This will flush the GPU and wait until it finished rendering your current frame and copied the resource. If you want to read from a GPU resource without introducing a CPU-GPU sync, you should read from a resource that you know is already finished on the GPU. There is a flag you can supply to the Map function: D3D11_MAP_FLAG_DO_NOT_WAIT. This will make the function immediately return an error code DXGI_ERROR_WAS_STILL_DRAWING if the GPU is not yet finished with the resource. Instead of immediately trying to read from the resource, you could double buffer the resource and read from the previous frame's resource to avoid stalling the CPU.
  11. You can use a compute shader and that way you will not have to set up the raster pipeline. With a compute shader, you will need to write to a RWTexture2D, instead of returning an output like a pixel shader, and set up a correct thread count, but it is a lot less work than a vertex shader + pixel shader. A simple example (just from the top of my head, I haven't compiled): Texture2D<float4> input : register(t0); SamplerState sam : register(s0); RWTexture2D<float4> output : register(u0); [numthreads(8,8,1)] void main(uint3 DTid : SV_DispatchThreadID) { output[DTid.xy] = input.SampleLevel(sam, (float2)DTid.xy / input_texture_resolution.xy, 0); } Such a shader will need to be started with a dispatch command: deviceContext->Dispatch((input_texture_resolution.x + 7) / 8, (input_texture_resolution.y + 7 ) / 8, 1); The dispatch command takes the input texture resolution and divides by the thread group size, so that each thread will work on a single pixel. +7 is there so that the integer divide doesn't underestimate if the resolution is not divisible by thread group size (which is 8 in this case). Good luck!
  12. Most of the stuff from the legacy SDK still applies, but the old D3DX math library is no longer included with the new DirectX SDK (which is part of the Windows SDK now). You can still use the old math library if you have the legacy SDK, but I wouldn't recommend it. Instead, there is a new open source math lib in the Windows SDK called DirectXMath. This includes everything that was in the D3DX Math library, but it has a different naming convention, and more focus on performance (sse support, everything is inlined, because it is a header only library). The graphics techniques themselves that were valid in the legacy SDK will require some effort to implement, but it is doable. There will be slight differences in how you interop with DXGI to present the final image too, and how you compile and link the program, but apart from this, the common graphics calls themselves have not changed at all. I would recommend to learn DirectX 11 today instead of OpenGL. It is just more reliable and consistent, and not much less portable. OpenGL will run on Linux and Windows, DirectX 11 will run on Windows and XBOX. Android phones will use all kinds of different OpenGL versions that might as well be different APIs. Apple products use their own graphics API, PlayStation and Nintendo consoles use their own graphics APIs. Generally you will want to use the platform's native API. However, after you are comfortable in one of them, the knowledge is transferrable to an other one, just with a slightly different syntax. This tutorial series should be a good starting point how to use the newer DX11 API: http://www.rastertek.com/tutdx11s2.html
  13. turanszkij

    Naming Vulkan objects

    Thanks, that's super helpful! I got it working with this. Would have never figured out on my own.
  14. Anyone had success in naming their vulkan objects via the extension VK_EXT_debug_utils? The extension is reported successfully, but when I have a call to vkSetDebugUtilsObjectNameEXT() function, the build cannot find the symbol for it. There is absolutely no documentation about whether I need to do anything specific for this: https://www.khronos.org/registry/vulkan/specs/1.1-extensions/man/html/vkSetDebugUtilsObjectNameEXT.html
  15. I cannot seem to eliminate some Vulkan debug layer errors regarding to incorrect texture layout (but the textures seem to work fine in practice). The debug layer is complaining that textures should be in VK_IMAGE_LAYOUT_GENERAL, but they are in VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL I am uploading texture data via vkCmdCopyBufferToImage using the copy queue, but later in the frame I will use those textures on the graphics queue. The textures are created with VK_IMAGE_LAYOUT_UNDEFINED initial layout, so before issuing vkCmdCopyBufferToImage , I transition them to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, by inserting a barrier like this: VkImageMemoryBarrier barrier = {}; barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER; barrier.image = (VkImage)pTexture2D->resource; barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; barrier.subresourceRange.baseArrayLayer = 0; barrier.subresourceRange.layerCount = pDesc->ArraySize; barrier.subresourceRange.baseMipLevel = 0; barrier.subresourceRange.levelCount = pDesc->MipLevels; barrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED; barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL; barrier.srcAccessMask = 0; barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT; barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; vkCmdPipelineBarrier( copyCommandBuffer, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier ); The src and dst queue families are ignored, because The following command using the texture is still executed on the copy queue: vkCmdCopyBufferToImage(copyCommandBuffer, textureUploader->resource, (VkImage)pTexture2D->resource, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, (uint32_t)copyRegions.size(), copyRegions.data()); Then I want to transition the texture from transferrable resource to VK_IMAGE_LAYOUT_GENERAL, to be used by the graphics pipeline, so I want to also transfer ownership to the graphics queue family with the barrier: barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL; barrier.newLayout = VK_IMAGE_LAYOUT_GENERAL; barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT; barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT; barrier.srcQueueFamilyIndex = queueIndices.copyFamily; barrier.dstQueueFamilyIndex = queueIndices.graphicsFamily; vkCmdPipelineBarrier( copyCommandBuffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier ); The texture that I create like this will be either sampled by shaders, or used as read-write texture from compute shaders. Did I miss something, or can I not transfer image layout like this between queues?
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!