Sign in to follow this  

So, how do you debug a problem you don't have?

This topic is 4595 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Many of you have probably had this problem before...you've worked long and hard on a cool demo, you go to show it toy our friends (real or internet), and it falls apart, leaving a gooey toxic mess on the floor where once beautifully shaded polygons lived in peace and harmony. There's lots of little things you can do to fix actual crashes or missing resources (DLL hell, file paths, etc). But how do you deal with more esoteric problems...or worse still, graphical glitches that you can't reproduce, and don't know the cause of? NOTE: I am not asking about a specific problem. This is a general question on how to cope with this sort of thing.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
There's lots of little things you can do to fix actual crashes or missing resources (DLL hell, file paths, etc). But how do you deal with more esoteric problems...or worse still, graphical glitches that you can't reproduce, and don't know the cause of?

Remote debugging. Let the executable and the debug kernel run on the machine where the program fails, and run the source level debugger/IDE on your development machine. Having the source + debugger on a notebook physically besides the problem-machine (connected via LAN) is obviously best, as you get direct visual feedback as you proceed through the code. But even if that's not possible, using a remote screen replication tool (such as UltraVNC) allows the same feedback over the internet.

Besides that, verbose log files are your friends.

Share this post


Link to post
Share on other sites
That works if you have the machine in question close by...but if it's a system you simply don't have access to? As good as most friends are, letting you have remote access to their system is usually not something people are happy about. And verbose logging...sometimes it just doesn't help, particularly if the problem was unforeseen and you don't know the cause.

Share this post


Link to post
Share on other sites
Logging everywhere, including a hardware report; scanning the debugger output for [ERR] prefixed messages to add to the log; componentized design to aid in keeping bugs isolated; ability to enable/disable/tweak portions of the app to produce a testcase.

The more information you gather, the better.

Share this post


Link to post
Share on other sites
description/screenshots of the glitch from the users, detailed description of the steps to reproduce the error, user environment/hardware to name a few.
graphics glitches on other machines are usually caused because your program made some assumptions about the state of the machine (what libraries, hardware are present).

let the user test the program some more to try and uncover all the bugs present before jumping in and fixing the code. fixing the code too early might result in some bugs being missed and ugly hacks being used just to fix a specific bug when a more detailed investigation of the problem is needed to fix all the bugs at the same time.

Share this post


Link to post
Share on other sites
All of the above, and:
- Simplfiy your actual running code to try and narrow down the problem. Strip out your lighting code and make it flat shaded. Remove your skinning code and just see if the static models display ok, etc. etc. If possible give them a bunch of debug toggles and see what the results are. If not, give them a few different exes with different things enabled.

- Get them to test similar but not your own apps. Eg. does Quake 3 run? Do the NeHe tutorials run?

Share this post


Link to post
Share on other sites
You have to be (or should be) prepared for this king of things, in a proactive way. From the first line of code you write, add error checking mechanisms. This is of very high importance for e.g. DirectX applications, where the driver can return an error code but still draws everything correctly. Don't be afraid of having 'too much' error checking. Once your application is nearing completion and these error checks are pulling performance down, you can still easily remove them after you've tested everything on multiple systems.

When you've done all this and things still go bad on a different system, there's no other option than to debug it on that system. Ideally you can compile and run on that system locally, but most debuggers allow to do it over a TCP/IP, FireWire or serial connection as well.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
That works if you have the machine in question close by...but if it's a system you simply don't have access to?

Well, a (broadband) internet connection is enough. Physically, the machine can be on the other side of the globe.

Quote:
Original post by Promit
As good as most friends are, letting you have remote access to their system is usually not something people are happy about.

You don't need remote access to their machine. All you need is access priviledges to the debugging kernel or monitor, and a realtime video stream of the screen. Nothing more.

Share this post


Link to post
Share on other sites
Yup, agree with all of the above. A few more that I do when working on PC games, very Windows and Direct3D centric (because that's what I know best), but the same principles apply to most OSes and APIs:

1) before submitting something for test, be absolutely 100% certain your use of the OS/API is correct:

a. use the debug Direct3D runtime with maximum validation and a high output level.

b. use the debug version of any helper libraries (e.g. d3dx9d.lib rather than d3dx9.lib).

c. test on the reference rasteriser to see how the app behaves with a device that doesn't support things that "just happen to work" with some specific hardware (e.g. using the Z buffer as a texture).

d. test with software vertex processing [so that D3D verifies things like indices for you].

e. run it through AppVerifier with all tests enabled.

f. go through any debug output step by step.

g. find every use of an external API and check with the documentation for correctness.

h. check any "clever" code/maths; remember to look at numerical precision and storage issues, particularly with shader code.

i. if possible, test locally with multiple operating systems, particularly of different generations (e.g. 9x, 2000, XP). You could use something like VirtualPC for this, or build a few test machines out of cheap second hand parts off eBay.

j. if possible, test locally with one card from each major vendor (e.g. an ATI card, an nVidia card, an Intel motherboard w/ integrated graphics, etc...).

k. check if there's a device capability (e.g. D3DCAPS9) concerning anything your app does. If there's a cap for something, then it means some hardware, somewhere doesn't/didn't support it; There are caps for things as simple as texturing and Gouraud shading (Matrox Millenium I didn't support texturing, and PowerVR PCX 1 didn't do full Gouraud shading for example).




2) things to add to your application to make it easier to debug:

a. sreenshot [obviously [smile]]. You can extend this by also having an option dump all offscreen surfaces (including the Z buffer!) to file too.

b. replay/deterministic mode - particularly useful for those "it only happens occasionally" problems.

c. dump the device capabilities structure (e.g. D3DCAPS9) to disk in your debug version. Testers can get you DXDIAG files easily enough, but they can't get caps so easily. This is particularly useful for obscure hardware which isn't listed in online caps databases.

d. include an option to use the reference rasteriser (assuming the person testing has a development setup).

e. make all features of your application externally controllable so you can enable/disable features from an .ini file (e.g. environment mapping on/off, multi-texturing on/off, etc).

f. include options like "no textures", "textures without shading" - it can sometimes be very difficult to determine what's going on from a static screenshot.

g. set the screen clear colour used in debug to something other than black (so you can see polygons that are being rendered black).

h. have an option to replace any particular vertex or pixel shader with one that outputs a solid colour (I often use bright yellow or pink because not much else in a real scene uses those colours); this is helpful for tracking down faulty shaders.




3) Things to do when you get a bug report (in no particular order):

a. "divide and conquer" - it's the fastest way; try to rule out faults in half of the program in single steps, then rule out half of that half etc

b. determine the path used to render the problematic scene/model/polygon/pixel (e.g. using the feature enable/disable options and solid colour shader replacement).

c. ask the tester to try using different drivers for their graphics card. Newer and older than the ones currently installed. If the problem changes/goes away, make a small test application (I usually modify an SDK sample) to test just the problematic feature in the simplest way possible. If the test app exhibits the same behaviour, contact the graphics card manufacturer and API standards maintainer about the problem and send them the test app.

d. diff the dumped device caps from the test machine with the device caps on your machine. Investigate any differences.

e. ask the tester to run using the reference rasteriser, then compare the output screenshots with the reference rasteriser on your machine (PIX can help with this). If the output is identical, the problem is most likely with the graphics card, its driver or your use of the graphics API (e.g. not checking caps). If the output is different, then the problem is most likely elsewhere in your code (e.g. AMD vs Intel internal precision differences with FPU affecting your simulation).

f. if the app is going to be publically released (particularly commercially), make friends with the developer relations folks at graphics card companies - they know all the hardware and driver bugs affecting their cards and also have highly instrumented "analysis" drivers and ICE versions of their chips

g. play back the replay/save on your machine - some graphics problems are really just symptoms of something elsewhere (e.g. simulation or AI)

h. post problematic screenshots to GameDev [wink]. Seriously, someone else might have seen that problem or similar before. There are very experienced folks from many major development companies and hardware companies reading and posting here regularly (I won't "out" anyone, but there are definately people from Microsoft, Sony, EA, nVidia, Bungie, Atari, Ubisoft, etc... [I work for one of those for example])

Share this post


Link to post
Share on other sites
One thing that might interest you. I couldn't get it working personally, but perhaps I messed up the code when copying from the book. I plan to give it another go in my next project.

There's a Windows API that lets you output a dump file if your program crashes. It's detailed in the book "Game coding complete" by "Mike McShaffry" (I highly recommend that book by the way, it's a good read). You can load the dump file into Visual Studio, and debug the program. (of course the mini-dump has to match exactly with the source code and symbol tables used to build the application, but I don't imagine that being a problem if you use source control, and a good build procedure)

Share this post


Link to post
Share on other sites

This topic is 4595 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this