DX11 - I'm not alone, right?

Started by
8 comments, last by NightCreature83 10 years, 6 months ago

Hi guys! wink.png

This is not really the usual question, but is it normal to crash the Nvidia card when messing around with directx 11, like sending huge data chunks to it and then rendering it in realtime, because mine does.

So the question is, have you ever crashed your nvidia card, if not, could my hardware be faulty?

Thanks, as usual. (I know, this is a weird question.)

-MIGI0027

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

Advertisement

I have crashed it but with an ATI and directx 10 and the cause was in my shaders, I had an infinite loop.

when it crashed the screen goes black and turns on again in about 1 second(like when installing new GPU drivers) and the app was automatically closed.

You mean the timeout detection and recovery (TDR). Can happen. The first time I played with compute shaders I had them alot. The cause was indeed sending too much data (lots of dispatch calls in short succession), so I guess the driver couldn't catch up anymore. The "solution" was to call ID3D11DeviceContext::Flush. My conclusion: You probably need to give the GPU some chance for synchronisation (like such a manual flush, a readback or a swapchain present).

So, this isn't some hardware problem or driver issue, but a "normal" behavior (be glad TDR exists, I think before Vista you were rewarded with a blue screen of death). IIRC the debug layer will also tell you something conclusive (driver hung or something).

Edit: Found it:

[6100] D3D11: ERROR: ID3D11Device::RemoveDevice: Device removal has been triggered for the following reason
(DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung.
As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred.
The application may want to respawn and fallback to less aggressive use of the display hardware). [ EXECUTION ERROR #378: DEVICE_REMOVAL_PROCESS_AT_FAULT ]


Emphasis mine: Anger management hint for GPU programming laugh.png

Yep.

when I was doing my terrain for the first time. that was me sending bad vertex data to the gpu. DX10

I had various issues related to multithreading. It seems like the NVIDIA driver crashes everytime you attempt to create vertex/index-buffers from multiple threads at the same time.

Good to know!

I somehow managed to make my GPU/Driver hang for 40-70 seconds (Pc Froze), I was almost certain that a BSOD was to appear, but instead my application was disconnected from the GPU and crashed, and the reason for that was because I was sending a gigantic amount of data to the GPU, over and over.

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

Maybe not related, but a lot of people have this issue with the latest NVIDIA drivers: https://forums.geforce.com/default/topic/550032/nvidia-driver-320-49-whql-freezing-my-pc/

I'm still using 314.22 like they recommend there, and have no issues. When I tried the latest drivers, my card started crashing like crazy.

As for multithreading, are you by any chance using D3D11_CREATE_DEVICE_SINGLETHREADED? http://msdn.microsoft.com/en-us/library/windows/desktop/ff476107%28v=vs.85%29.aspx . Looking through those flags, another interesting one is D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT. :)

I had this happen a few times back when the drivers for D3D11 were just starting to be released and I was doing some big compute shader workloads. It was very repeatable, and it makes you inspect what you are doing very carefully :)

So to answer your question, no it isn't just you - you are just using your GPU a little too aggressively.

It is stupidly easy to crash a graphics driver, but if you have TDR enabled then it is difficult to actually kill it if the operating system is not compromised. Simply put, GPU drivers do not offer error recovery features as advanced as, for instance, native code running on a standard processor and operating system. The driver can try and do some logic checking for you before issuing the work to the graphics hardware, by way of return codes - which you should always check, by the way - but once the GPU is busy, the driver has very few options to detect any error conditions which the device cannot resolve on its own - such as infinite loops in shaders - much less tell your application about them.

A lot of that comes from the fact that the current generation of GPU's do not support preemptive tasking, that is, they cannot be gracefully interrupted once a shader or kernel is launched on them. If the driver does not have TDR - which detects when the GPU has stopped responding for X seconds and resets it - then you are stuck with a brick and have to reboot to get the GPU back.

I wouldn't worry about it. Just use the GPU properly and the driver should not crash, if it still does then it is more likely to be a driver bug than faulty hardware. Generally, assuming it is not overheating, hardware is either broken or not. If it worked yesterday, and it passed POST today, then it will work another day.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Yep.

when I was doing my terrain for the first time. that was me sending bad vertex data to the gpu. DX10

DX11 runtime won't handle that one either and you get a nice callstack somewhere deep down in the driver, walk the stack untill you find the faulting drawcall and see how many verts you are telling it to have.

Badly formed drawcalls will end up in a similar state, so always validate that what you are sending to the GPU is what you expect it to be.

THis might become harder to debug depening on which VS version you are using as call stacks can get corrupted when dealing with user->kernel->user->kernel code transitions which are common when doing d3d calls. http://randomascii.wordpress.com/2013/08/19/developers-rejoicewindows-7-stack-corruption-fixed/

Btw another sign of misusing the API or driver is when the frame rate is extremely low for simple scenes and shaders, again look at the calls and see that they are issued correctly, I found that the D3D11.1 runtime isn't that forth coming in telling you whats going on either when you run the debug version. I found a fix for this as soon as you are using the DX11.1 runtime the SDK control panel does nothing any more. You can find how to set the debug runtime here: http://www.altdevblogaday.com/2013/09/30/fixing-the-directx-d3d-debug-layer/

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

This topic is closed to new replies.

Advertisement