Rapid-fire debugging thoughts
Just a collection of assorted things that have been running through my mind during the past week and a half of marathon debugging...
- Unless you have a really darn good reason, don't use different coordinate systems for different parts of your game, and especially not if you're using floating-point representations. Conversion back and forth between coordinate spaces will inevitably introduce floating-point drift, and if your data flow involves conversions, you will start to see discrepancies between systems that use the different coordinate spaces. This can be a nightmare to figure out.
- If you're dumping a lot of stuff to a log using printf()-style format commands, it pays to be sure you're dumping an actual float and not a SIMD float wrapper struct. What appears to produce sane numbers on one machine/architecture can start spewing random stack/heap gibberish on another machine just because of alignment coincidences. In general, when using any printf()-style formatter, be sure you're actually passing what you think you're passing.
- Resist the urge to treat any system as a properly debugged black box. If you watch the event sequence A, B, C and something goes wrong between A and C, suspect B no matter how thoroughly debugged you think B actually is. Nothing sucks like wasting hours scouring A and C only to find out that B was actually the problem all along.
- The corollary to this, of course, is that if B really is trustworthy, the bug might just be in the way you're using it. The OS and compiler are (probably) not broken, but you can make them look that way by violating their usage contracts.
- Adding debug logging is great unless you're chasing something that might be timing-related. If you start adding logging and the repro conditions change, that's a good sign you have a Heisenbug lurking in the code. These are scary because the closer you look the less chance you have of landing on the bug; I personally am not real good at figuring these out yet. I think it's a combination of intuition and study of the code flow without actually running it - the kind of stuff you normally need for concurrency issues and so on. Seeing a Heisenbug appear in what should be fully serial deterministic code is a bit terrifying.
- Above all else, try not to be superstitious. There are logical reasons for everything going on in your program, if you look far enough. They may not be easy to explain, but they're there. It's easy to start suspecting really bizarre crap when you're at the limits of your understanding of a system; the solution is to understand things better, not resort to gross speculation.
- Don't underestimate the importance of stepping away for a while. I often find that walking away for 20-30 minutes and coming back can be deeply refreshing and serves to help break out of the ruts of assumptions that are built up when staring too closely at something. Escaping for a bit forces you to flush out the things you think you know and rebuild your contextual picture of the problem; for a lot of hard bugs, finding the problem is more a matter of what you ignore (which you shouldn't be ignoring) than a matter of solving some kind of mystery.
- You're probably going to have to learn other people's code. This is not such a big deal if the author in question is still accessible for questioning and enlightenment; if they're not reachable, however, you're in for a rough ride. Resist the urge to just poke at things until stuff changes. Expend the effort to understand the actual code and why it is the way it is.
- It's tempting to suspect The Other Guy's code, especially if said Other Guy is no longer available to defend himself. Fight this urge for as long as you can, because it leads to assumptions about where the problem lies. Good debugging is all about eliminating your assumptions and replacing them with verified factual knowledge.
- One of the hardest tricks in debugging a complex system is to know when to broaden your search and when to narrow it down. If you have a problem in a narrow area of code, the bug might be inside that area itself, or well outside it and just happens to appear in that particular place. Knowing how to identify when a bug is in a piece of code and when it's just manifesting there is a black art, but well worth mastering.
- Build good debugging tools into your program sooner rather than later. It's much easier to find and fix bugs if you have good unit tests, visualization tools, and automated regression tracking. Once you get far enough into a project, it might be too late to go back and do those things, which means you're stuck having to resort to staring at tens of thousands of lines of floating point coordinates looking for discrepancies in the 8th decimal position. You don't want to be stuck there, believe me.