Blowing up the video drivers.

Started by
6 comments, last by MJP 14 years, 2 months ago
There are alot of posts out there on the net for a ' ~whatever~ video drivers stop responding' then recover. Basically the screen goes black then recovers after a few seconds. But the program shuts down. I think this may be a Vista / Win 7 only thing. I seem to have managed to put this problem into my program, I've learn a few things but have some questions. I'd like any feedback. First, if this does happen, there is no restarting the program. You have to shut down the machine then bring it back up. Otherwise the the problem happens immediately when you start the program, as where before it ran for some time before the error. What's the deal with this??? Second, the problem seems to regularly occur if you are putting some obscene values into the HLSL. For example, if you have a matrix that has some rendom junk in it and your using it to transform a vertex, you might see your mesh exploded in some cycodelic fashion accross the screen for a few seconds, the the display crash. You would think the either the DX api, of the display drivers would not allow this kinda of crash but apparently so. The problem is that it is happening pretty far into my program. I think I have a bad value I'm throwing into the shader, but troublshooting is a bear since the program totally shuts down, then I need to restart. Anywone know any troublshooting techiques for this problem? Thanks, Matt
Advertisement
When you get the "driver stop responding" message, that means your Kernel Mode Driver has crashed. Under WDDM, there is a user mode and a kernel mode driver component. When something bad happens in the user mode driver, the app would just crash but your system would still be stable. If the kernel mode driver crashes, then you'll generally see a BSOD. The "stopped responding" message is generally because either an exception occurred in the hardware that was caught (perhaps by the D3D runtime didn't like something), OR that the rendering operation you are trying to do in your app did not finish within an allocated amount of time (this is 2 or 3 seconds for a DMA buffer, can be changed via reg key). The latter is called a TDR. The kernel mode driver is then supposed to attempt to recover from this issue. From your description it sounds like the KMD recovers, but is not in a very stable state after that.

By the way, your description has all the signs of an ATI card / driver ;)
nope, NVIDA, but i dont think that matters.

are you aware of any way I could trick the system so that I do not need to do a full reboot before the app will run again?

I'm sure the problem is of my own making. but you would think that if i was setting something outside a valid range the API would stop me. Welcome to the brave new world of GPU programming.
It matters the world whether its ATI or NVIDIA, they both have different drivers and one can have the bug whereas the other may not.

Anyway, the KMD should never stop working and put the system in an unstable state. Tell me exactly what you are doing so I can reproduce this problem on my end. Also, what card are you running? Is it a very low end (mobile?) gpu? I write D3D drivers professionally so I can probably help resolve this.
You probably wont be able to reproduce the problem since it's cause by my executable. The system does not appear to be unstable after the crash. Mayby I should clarify little more. Once the crash occurs, Windows runs normally. All except for the fact that my executalable will no longer run at all. Interestingly enough, no verion of the problem will run. whether i try to run it from the IDE, the debug or release version of the programs, it's all the same immediate display crash until a reboot.

I'm running an NVIDIA Quadro FX 4600, on a Dual Xeon system. It's a pretty beefy system.
If it is your executable then what is preventing you from debugging it? What statement is the crash happening at? Does it happen when you try to create a new D3D device or resource?

Anyway, I'm not sure what you are asking for here. I don't think anyone will be able to answer this in the abstract without knowing exactly how this problem occurs.
After the crash happens the first time, the program will start and then loop through the programs main sequence for a few frames before crashing again. It's so bogus and the fact that a reset fixes the program means the problem exists inside the display drivers.

Even after the first crash the device is being created successfully and some number of draw calls are getting exectued.

What I want is a way to reset whatever the heck went wrong, so that at least my program will respond like is does after a reboot.

It would alo be nice to be able to deternmine why the display crash happens in the first place.
It's a lot easier than you think to crash a driver. I do it all of the time at work. IF you want DirectX to do the maximum amount of validation and runtime checks then you'll want to make sure you're using the debug runtimes. That can sometimes help you catch problems before they bring down the driver, but not always.

This topic is closed to new replies.

Advertisement