Bugs caused by stupidity are hard to find

Started by
1 comment, last by SeraphLance 7 years ago

My rendering code is meant to be API-agnostic (I am actually using two different APIs). I have a VideoDriver class and a pure virtual HardwareBuffer class. Whenever I need to create a new HardwareBuffer you create it via the VideoDriver class. Internally, the VideoDriver implementation stores a vector of the actual HardwareBuffer objects. When you create a new one, it pushes a new object to the back of the vector and returns a pointer to the virtual class.

This worked until I changed some of the code. nvogl.dll would sporadically crash because it tried referencing a null pointer. I eventually figured out that it was caused by the creation of the buffers, but after probing various parts of the code I couldn't figure out why it wasn't working.

I spent maybe 10 hours searching through my code trying to figure out what could've possible gone wrong. I eventually found out that after a certain number of buffers were created, the vector would resize and cause all pointers previously returned to become dangling (causing the VBO handle to become trashed). The fact that the memory addresses would change after a reallocation had never crossed my mind. I think I've been using Java too much recently.

Advertisement

Moved to Coding Horrors.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

Indeed. I just had my own episode of stupid.

I was implementing CEF for a UI system in my game, but not using the standard C/C++ family so I had to use some bindings. Unfortunately, for the language I was using the existing bindings were about 3 years old, and since CEF has very little qualms changing the API I had to update a lot of the function signatures (usually deleting functions entirely) in order to get it to build at all. After spending about two straight days trying to get DLL's debug symbols due to various mishaps (which is a hilarious story in itself), I discovered that it was crashing on initialization.

Okay, fine. Digging through, I realized it was trying to copy a configuration struct (CefSettings for the initiated), presumably to martial it over to another thread. After some debug prints and sanity checks, I realized that it was failing to allocate because it was trying to copy a string in this struct with some ungodly length, like 182375342980 or something. "That doesn't make any sense", I thought to myself. The string itself is empty, The debugger shows it as empty,. and that same struct in my application's code is initialized properly. How can a struct in my code reliably get corrupted immediately down the stack when passed into a well-used and tested library? Of course, after asking that question I immediately realized the problem. So I spent the next few hours fixing all the struct definitions in the binding.

Lesson learned: Don't try to change a binding "as little as possible to get it to compile." It took me almost three days to fix something that should've just been common sense. I did experience all kinds of fun things in the process, like half a day just trying to figure out why my built DLL crashed the visual studio debugger 100% of the time, and discovering that neither the "how to build" documentation nor the minimum space requirements should ever be trusted.

This topic is closed to new replies.

Advertisement