• 10
• 10
• 12
• 12
• 14
• ### Similar Content

• Hi, I am having problems with all of my compute shaders in Vulkan. They are not writing to resources, even though there are no problems in the debug layer, every descriptor seem correctly bound in the graphics debugger, and the shaders definitely take time to execute. I understand that this is probably a bug in my implementation which is a bit complex, trying to emulate a DX11 style rendering API, but maybe I'm missing something trivial in my logic here? Currently I am doing these:
Set descriptors, such as VK_DESCRIPTOR_TYPE_STORAGE_BUFFER for a read-write structured buffer (which is non formatted buffer) Bind descriptor table / validate correctness by debug layer Dispatch on graphics/compute queue, the same one that is feeding graphics rendering commands.  Insert memory barrier with both stagemasks as VK_PIPELINE_STAGE_ALL_COMMANDS_BIT and srcAccessMask VK_ACCESS_SHADER_WRITE_BIT to dstAccessMask VK_ACCESS_SHADER_READ_BIT Also insert buffer memory barrier just for the storage buffer I wanted to write Both my application behaves like the buffers are empty, and Nsight debugger also shows empty buffers (ssems like everything initialized to 0). Also, I tried the most trivial shader, writing value of 1 to the first element of uint buffer. Am I missing something trivial here? What could be an other way to debug this further?

• By khawk
LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:

View full story
• By khawk
LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:

• I have a pretty good experience with multi gpu programming in D3D12. Now looking at Vulkan, although there are a few similarities, I cannot wrap my head around a few things due to the extremely sparse documentation (typical Khronos...)
In D3D12 -> You create a resource on GPU0 that is visible to GPU1 by setting the VisibleNodeMask to (00000011 where last two bits set means its visible to GPU0 and GPU1)
In Vulkan - I can see there is the VkBindImageMemoryDeviceGroupInfoKHR struct which you add to the pNext chain of VkBindImageMemoryInfoKHR and then call vkBindImageMemory2KHR. You also set the device indices which I assume is the same as the VisibleNodeMask except instead of a mask it is an array of indices. Till now it's fine.
Let's look at a typical SFR scenario:  Render left eye using GPU0 and right eye using GPU1
You have two textures. pTextureLeft is exclusive to GPU0 and pTextureRight is created on GPU1 but is visible to GPU0 so it can be sampled from GPU0 when we want to draw it to the swapchain. This is in the D3D12 world. How do I map this in Vulkan? Do I just set the device indices for pTextureRight as { 0, 1 }
Now comes the command buffer submission part that is even more confusing.
There is the struct VkDeviceGroupCommandBufferBeginInfoKHR. It accepts a device mask which I understand is similar to creating a command list with a certain NodeMask in D3D12.
So for GPU1 -> Since I am only rendering to the pTextureRight, I need to set the device mask as 2? (00000010)
For GPU0 -> Since I only render to pTextureLeft and finally sample pTextureLeft and pTextureRight to render to the swap chain, I need to set the device mask as 1? (00000001)
The same applies to VkDeviceGroupSubmitInfoKHR?
Now the fun part is it does not work  . Both command buffers render to the textures correctly. I verified this by reading back the textures and storing as png. The left texture is sampled correctly in the final composite pass. But I get a black in the area where the right texture should appear. Is there something that I am missing in this? Here is a code snippet too
void Init() { RenderTargetInfo info = {}; info.pDeviceIndices = { 0, 0 }; CreateRenderTarget(&info, &pTextureLeft); // Need to share this on both GPUs info.pDeviceIndices = { 0, 1 }; CreateRenderTarget(&info, &pTextureRight); } void DrawEye(CommandBuffer* pCmd, uint32_t eye) { // Do the draw // Begin with device mask depending on eye pCmd->Open((1 << eye)); // If eye is 0, we need to do some extra work to composite pTextureRight and pTextureLeft if (eye == 0) { DrawTexture(0, 0, width * 0.5, height, pTextureLeft); DrawTexture(width * 0.5, 0, width * 0.5, height, pTextureRight); } // Submit to the correct GPU pQueue->Submit(pCmd, (1 << eye)); } void Draw() { DrawEye(pRightCmd, 1); DrawEye(pLeftCmd, 0); }

• Hi,
I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.

# Vulkan SSAO problem in Vulkan

This topic is 438 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey there,

So I have been having some problems with SSAO shader in Deffered Rendering for very long long time. And mostly due to this problem stopped doing anything for several months.

So this time decided to ask for help as would like to solve this nastiness and keep on learning different shader techniques.

I also profilled in RenderDoc and data seems very similar. But for some reason SSAO does not work.

Here is the screen with range check (the white framebuffer image show ssao outpout):

[attachment=34881:SSAO1.png]

Here is the image without range check (the dark framebuffer image show ssao output):

As you can see (its dark pitch black... No idea really why)

[attachment=34880:ssao2.png]

In first shader pass I generate textures for deffered shader and SSAO.

This code generates normals in vertex shader for SSAO:

mat3 normalMatrix = transpose(inverse(mat3(mvMatrix)));
vs_out.normal = normalMatrix * inNormal;


and I convert vertices into model space as my deffered calculation is in modelspace

vs_out.ws_coords =  vec3(ubo.modelMatrix * tmpPos);


In fragment shader I pass it like this:

outNormal = vec4(normalize(fs_in.normal) * 0.5 + 0.5, LinearizeDepth(gl_FragCoord.z));


also I compress the texture for better perfomance:

//Position
outvec0.x = packHalf2x16(fs_in.ws_coords.xy);

//Pos And Specular
outvec0.y = packHalf2x16(vec2(fs_in.ws_coords.z, specularTexture.x));


When I come to my SSAO pass (in fragment shader):

float SSAOAlgo0()
{
ivec2 P1 = ivec2(inUV * textureSize(PosSpecularPacked, 0));

uvec2 uvec2_PosSpecularPacked = texelFetch(PosSpecularPacked, P1, 0).rg;
vec4 normalDepthTexture = texture(NormalDepth, inUV, 0);

//Get position and depth texture
vec2 tempPosition0 = unpackHalf2x16(uvec2_PosSpecularPacked.x);
vec2 tempPosAndSpec = unpackHalf2x16(uvec2_PosSpecularPacked.y);

vec3 fragPos = vec3(tempPosition0, tempPosAndSpec.x);

//Convert frag pos to view space as the fragPos is in modelSpace at the moment
fragPos = vec3(ubo.view * vec4(fragPos, 1.0f));
//fragPos.y = -fragPos.y;

//Get normal
vec3 normal = normalize(normalDepthTexture.xyz * 2.0 - 1.0);

//Random vec using noise lookup
ivec2 texDim = textureSize(NormalDepth, 0);
ivec2 noiseDim = textureSize(texNoise, 0);
const vec2 noiseUV = vec2(float(texDim.x)/float(noiseDim.x), float(texDim.y)/(noiseDim.y)) * inUV;
vec3 randomVec = texture(texNoise, noiseUV).xyz * 2.0 - 1.0;

// Create TBN change-of-basis matrix: from tangent-space to view-space
vec3 tangent = normalize(randomVec - normal * dot(randomVec, normal));
vec3 bitangent = cross(normal, tangent);
mat3 TBN = mat3(tangent, bitangent, normal);

// Iterate over the sample kernel and calculate occlusion factor
float f_occlusion = 0.0f;
for(int i = 0; i < SSAO_KERNEL_SIZE; ++i)
{
// get sample position
vec3 Sample = TBN * ubossaokernel.samples[i].xyz; // From tangent to view-space
Sample = fragPos + Sample * SSAO_RADIUS;

// project sample position (to sample texture) (to get position on screen/texture)
vec4 offset = vec4(Sample, 1.0f);
offset = ubo.projection * offset; // from view to clip-space
offset.xyz /= offset.w; // perspective divide
offset.xyz = offset.xyz * 0.5f + 0.5f; // transform to range 0.0 - 1.0

// get sample depth
float sampleDepth = -texture(NormalDepth, offset.xy, 0).w; // Get depth value of kernel sample

// range check & accumulate
//#define RANGE_CHECK
#ifdef  RANGE_CHECK
float rangeCheck = smoothstep(0.0f, 1.0f, SSAO_RADIUS / abs(fragPos.z - sampleDepth ));
f_occlusion += (sampleDepth >= Sample.z ? 1.0f : 0.0f) * rangeCheck;
#else
f_occlusion += (sampleDepth >= Sample.z ? 1.0f : 0.0f);
#endif
}
f_occlusion = 1.0f - (f_occlusion / float(SSAO_KERNEL_SIZE));
return f_occlusion;
}

void main()
{
FragColor = SSAOAlgo0();
}

SSAO kernel and Noise generation looks like this:

void Renderer::InitializeSSAOData()
{
std::uniform_real_distribution<float> randomFloats(0.0f, 1.0f); // random floats between 0.0 - 1.0
std::default_random_engine generator;

for (uint32_t i = 0; i < 64; ++i)
{
glm::vec3 sample(
randomFloats(generator) * 2.0f - 1.0f,
randomFloats(generator) * 2.0f - 1.0f,
randomFloats(generator)
);
sample = glm::normalize(sample);
sample *= randomFloats(generator);

float scale = scale = static_cast<float>(i) / 64.0;
scale = lerp(0.1f, 1.0f, scale*scale);

sample *= scale;
uboSSAOKernel.ssaoKernel[i] = glm::vec4(sample, 0.0f);
}

std::vector<glm::vec4> ssaoNoise;
ssaoNoise.reserve(16);
for (uint32_t i = 0; i < 16; i++)
{
glm::vec4 noise(
randomFloats(generator) * 2.0 - 1.0,
randomFloats(generator) * 2.0 - 1.0,
0.0f, 0.0f);
ssaoNoise.push_back(noise);
}

//Generates texture. But cant see anything in debugger in it. Better generate on gpu... at the moment
GenerateTexture(ssaoNoise, 4, 4, 1, VK_FORMAT_R32G32B32A32_SFLOAT, &m_NoiseGeneratedTexture, VK_IMAGE_USAGE_SAMPLED_BIT, VK_FILTER_NEAREST);

//Send data
void * pData;
VK_CHECK_RESULT(vkMapMemory(m_pWRenderer->m_SwapChain.device, uniformData.ssaokernel.memory, 0, sizeof(uboSSAOKernel), 0, (void **)&pData));
memcpy(pData, &uboSSAOKernel, sizeof(uboSSAOKernel));
vkUnmapMemory(m_pWRenderer->m_SwapChain.device, uniformData.ssaokernel.memory);
}

The full code can be found in here: (https://github.com/TywyllSoftware/TywRenderer/tree/master/Projects/SSAO)

Thanks.

Edited by renderkid