• 10
• 12
• 12
• 14
• 15
• ### Similar Content

• Hi, I am having problems with all of my compute shaders in Vulkan. They are not writing to resources, even though there are no problems in the debug layer, every descriptor seem correctly bound in the graphics debugger, and the shaders definitely take time to execute. I understand that this is probably a bug in my implementation which is a bit complex, trying to emulate a DX11 style rendering API, but maybe I'm missing something trivial in my logic here? Currently I am doing these:
Set descriptors, such as VK_DESCRIPTOR_TYPE_STORAGE_BUFFER for a read-write structured buffer (which is non formatted buffer) Bind descriptor table / validate correctness by debug layer Dispatch on graphics/compute queue, the same one that is feeding graphics rendering commands.  Insert memory barrier with both stagemasks as VK_PIPELINE_STAGE_ALL_COMMANDS_BIT and srcAccessMask VK_ACCESS_SHADER_WRITE_BIT to dstAccessMask VK_ACCESS_SHADER_READ_BIT Also insert buffer memory barrier just for the storage buffer I wanted to write Both my application behaves like the buffers are empty, and Nsight debugger also shows empty buffers (ssems like everything initialized to 0). Also, I tried the most trivial shader, writing value of 1 to the first element of uint buffer. Am I missing something trivial here? What could be an other way to debug this further?

• By khawk
LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:

View full story
• By khawk
LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:

• I have a pretty good experience with multi gpu programming in D3D12. Now looking at Vulkan, although there are a few similarities, I cannot wrap my head around a few things due to the extremely sparse documentation (typical Khronos...)
In D3D12 -> You create a resource on GPU0 that is visible to GPU1 by setting the VisibleNodeMask to (00000011 where last two bits set means its visible to GPU0 and GPU1)
In Vulkan - I can see there is the VkBindImageMemoryDeviceGroupInfoKHR struct which you add to the pNext chain of VkBindImageMemoryInfoKHR and then call vkBindImageMemory2KHR. You also set the device indices which I assume is the same as the VisibleNodeMask except instead of a mask it is an array of indices. Till now it's fine.
Let's look at a typical SFR scenario:  Render left eye using GPU0 and right eye using GPU1
You have two textures. pTextureLeft is exclusive to GPU0 and pTextureRight is created on GPU1 but is visible to GPU0 so it can be sampled from GPU0 when we want to draw it to the swapchain. This is in the D3D12 world. How do I map this in Vulkan? Do I just set the device indices for pTextureRight as { 0, 1 }
Now comes the command buffer submission part that is even more confusing.
There is the struct VkDeviceGroupCommandBufferBeginInfoKHR. It accepts a device mask which I understand is similar to creating a command list with a certain NodeMask in D3D12.
So for GPU1 -> Since I am only rendering to the pTextureRight, I need to set the device mask as 2? (00000010)
For GPU0 -> Since I only render to pTextureLeft and finally sample pTextureLeft and pTextureRight to render to the swap chain, I need to set the device mask as 1? (00000001)
The same applies to VkDeviceGroupSubmitInfoKHR?
Now the fun part is it does not work  . Both command buffers render to the textures correctly. I verified this by reading back the textures and storing as png. The left texture is sampled correctly in the final composite pass. But I get a black in the area where the right texture should appear. Is there something that I am missing in this? Here is a code snippet too
void Init() { RenderTargetInfo info = {}; info.pDeviceIndices = { 0, 0 }; CreateRenderTarget(&info, &pTextureLeft); // Need to share this on both GPUs info.pDeviceIndices = { 0, 1 }; CreateRenderTarget(&info, &pTextureRight); } void DrawEye(CommandBuffer* pCmd, uint32_t eye) { // Do the draw // Begin with device mask depending on eye pCmd->Open((1 << eye)); // If eye is 0, we need to do some extra work to composite pTextureRight and pTextureLeft if (eye == 0) { DrawTexture(0, 0, width * 0.5, height, pTextureLeft); DrawTexture(width * 0.5, 0, width * 0.5, height, pTextureRight); } // Submit to the correct GPU pQueue->Submit(pCmd, (1 << eye)); } void Draw() { DrawEye(pRightCmd, 1); DrawEye(pLeftCmd, 0); }

• Hi,
I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.

# Vulkan How Do You Deal With Errors On Gpus? Do You At All?

This topic is 634 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I have a FuryX since launch and it worked for every game wihout issues and still does.

Now, after starting using it for developement, i have to realize the card is broken.

It calculates wrong results in one of 60000 cases, causing blue screens in my case (Vulkan -> accidently setting huge loop count -> blue screen)

An older 5850 works without issues and the test case is very simple (prefix sum), so probably it's no driver issue but a hardware failure.

I'll port to OpenCL to see if it's reproducible, but if i'm right i really wonder why the card works in games.

Is it common practice to expect errors and handle them???

Is it true that we can not expect GPUs to be as robust as CPUs? Some people say so but to be honest - i thought they simply tend to forget barriers.

I've worked seriously with only 4-5 GPUs up to now, but they all did exactly what i've told them - always and at least for hours.

##### Share on other sites

I'm in the camp that yes, (consumer) GPUs are not as reliable as CPUs. They take a lot of shortcuts and optimizations that generate nearly-correct results. That said, a blue screen failure due to a specific operation is definitely out of the ordinary, though using Vulkan in these early days definitely exposes you to it more than it otherwise would. This actually does sound like you should build a test case and submit it to AMD, as a lot of these weird corner cases can show up dependent on the GPU and they may not have noticed.

As far as handling errors - it is common to have workarounds for certain hardware configurations that are known to break. It is also common to have workarounds for particular drivers, particular operating systems, etc. These are all derived from testing before release (or afterwards...) to find out what works and what doesn't. What you don't have, though, is the ability to detect and handle GPU or driver errors in any sensible way. A blue screen is a kernel mode unhandled exception, and there is not a damn thing you can do about it after the bug has been invoked.

##### Share on other sites

Thanks, agree but already know - i'll describe the problem in more detail:

I use the prefix sum on uints to fill an acceleration structure.

The hardware bug (if so) seems to ignore my barriers, so it can happen that array[i+1] is smaller than array - usually array[i+1] MUST be >= array.

(This also happens with work group size of 64, but less often)

Later when processing for (uint i = array; i<array[i+1]; i++), because of unsigned numbers the difference overflows and gives a huge number close to 0xFFFFFFFFu.

Now i don't know if long runtime or out of buffer writes cause the blue screen, but i know why it happens.

So you say "GPUs are not as reliable as CPUs" - ok, but this is not about accuracy, it's a complete malfunction.

I mean - nobody is going to check array[i+1] >= array after a prefix sum, no matter if GPU or CPU, do you agree?

Hopefully it's just a Fury related driver bug, but i'll double check in OpenCL before wasting AMDs time...

##### Share on other sites

Later when processing for (uint i = array; i<array[i+1]; i++), because of unsigned numbers the difference overflows and gives a huge number close to 0xFFFFFFFFu.
Now i don't know if long runtime or out of buffer writes cause the blue screen, but i know why it happens.

I'm 99% sure what's happening is that because you get a bad loop count, your GPU will take too long to respond and thus you run into TDR.

The hardware bug (if so) seems to ignore my barriers, so it can happen that array[i+1] is smaller than array - usually array[i+1] MUST be >= array.
(This also happens with work group size of 64, but less often)

How are you issuing your barrier? Beware in GLSL memoryBarrier does only half of the job. You also need a barrier:

//memoryBarrierShared ensures our write is visible to everyone else (must be done BEFORE the barrier)
//barrier ensures every thread's execution reached here.
memoryBarrierShared();
barrier();


##### Share on other sites

No i even replaced my own code with this code from OpenGL Superbible and added additional barriers. Still the same bugs.

layout (local_size_x = 128) in;
#define lID gl_LocalInvocationID.x
shared uint _texCount [257];
for (uint step = 0; step < 8; step++)
{
uint mask = (1 << step) - 1;
uint rd_id = ((lID >> step) << (step + 1)) + mask;
uint wr_id = rd_id + 1 + (lID & mask);
uint r = _texCount[rd_id + 1];
barrier(); memoryBarrierShared(); // paranoia
_texCount[wr_id + 1] += r;
barrier(); memoryBarrierShared();
}


##### Share on other sites
I'd suggest sending your repro case to AMD dev support; if there is a bug, only they can fix it and we can't do anything to help.

##### Share on other sites

The code you posted starts with _texCount uninitialized, which won't work as intended. It doesn't start with 0s unless you fill it. If you do fill it in your actual code, you need to sync that as well.

##### Share on other sites

Good point! Of course i did init the data, but it would be possible i upload wrong huge numbers causing overflow on input.

Added this to ensure small numbers:

if (lID==0)
{
for (uint i=0; i<=NUM_TEXELS; i++) _texCount &= 0xFF;
}
memoryBarrierShared();
barrier();

But damn - it still happens. F....

I Also remembrer i already tried to fill it with all 1s. The bug mostly happens on array index 192, or 193 if i offset by one like in given code.

I run the shader each frame but on constant input data (i even stop uploading input data after some frames)

The bug happens on random indices of all 60000 work groups, next frame it's correct there just to fail somewhere else.

It's some work to port everything to OpenCL, after that i'll post stripped down complete shader code tomorrow.

I need to be sure because i already gave slightly wrong information to AMD yesterday...

##### Share on other sites

A day later there's just more confusion.

In OpenCL version the exclusive prefix sum (1-256) fails always with a very different error pattern, but the inclusive (0-255) version works.

In Vulkan exclusive fails more often than inclusive, but still only rarely, for few seconds no errors, then always more...

The stripped down version i made based on AMDs Vulkan GCN extension sample code works without issues - tested for half an hour.

All use the same code.

Also Doom Vulkan demo works, no pixel errors or something.

So i'm pretty sure the GPU is ok and a driver bug in Vulkan AND OpenCL is unlikely.

Most probably i don't know what i'm doing... :)

##### Share on other sites

Anyone with time and a Fiji GPU is welcome to try the test case i've sent to AMD: https://github.com/JoeJGit/OpenCL_Fiji_Bug_Report

Includes project and if you dare - binaries (zipped only).

The bug is reproduceable and happens only with 32 bit version (the log output should show increasing numbers, but i get chaos).