Exception handling, debugging a custom memory allocator and working with first-chance exceptions

Started by
7 comments, last by Adam_42 9 years, 9 months ago

The problem: I've been working on and fleshing out my codebase for several years now and a vast majority of the time it works as expected. Nevertheless, there are occasional oddball situations that can easily lead to bug hunts that last for several days and often result in my finding various (seemingly) unrelated bugs that often also solve the problem at hand, but in a very obscure way. In such situations I often have nothing more to do than shrug and carry as if something so inexplicable happened that it defied the laws of physics. Some of this has to do with code complexity; however, I suspect more than that, this has to do with my knowledge of how to handle these situations.

The setup: I'm writing a multi-threaded application that currently branches into two primary threads - the render thread and the input thread. Both have SEH exception handlers at the stem: surrounding the entire body of code in the render thread and in the WNDPROC callback in the input thread. I'm catching all standard exceptions.

Where it gets complicated: every now and then I stumble upon a first-chance exception that throws off my entire program. Now, I know what a first-chance exception is and I know why it's being thrown the way it is. However, the root cause for it is usually an access violation, which is always terminal in nature. Using the Disassembly window I can look up what's at the address where the exception is thrown, but generally the it seems to be thrown in unmapped address space, which a) doesn't really help with pinpointing the cause and b) gives no indication as to where where or when it was thrown.

Where it gets more complicated: I'm using a custom memory allocator that I wrote. Is it bug-free? Good question. The bottom line is, it's not too complicated and doesn't support fancier features like reference counting or compaction. But it's fast, multithreaded and it gets the job done. At least as far as I can tell. It also makes debugging considerably more obscure.

The confusion: here's a very short snippet of code that exemplifies a common case of confusion. The following is a response to keyboard input and crashes always in the same way at the same moment. I've gone over the code preceding it and I can't find anything that might write into an invalid memory address (directly or remotely via another thread). I do recognize that this kind of inspection is concessional and doesn't really guarantee that I didn't miss a bug. Nevertheless, this is still a fairly strong indicator that by all logic the access violation cannot occur in any other thread (since it's temporally locked to user input) and has a low probability of occurring sometime before the below snippet is executed. This, in turn, completely screws up any and all logic when it comes to tracking down the cause:


int Editor::HandleKeyboardInput(...)
{
   ...
   if(toolActive) { toolActive->Activate(false); }
      toolActive = newTool;
   if(toolActive) //all cool in the Watch window
      toolActive->Activate(true); //BOOM! crash, because all of a sudden the 'this' pointer is NULL!
                                  //EDIT: apparently the 'this' pointer is modified only occasionally; other times newTool's
                                  //Activate() starts pointing to unmapped space
}

Running the debugger through this with application-side exception handling disabled just gives me an infinite loop of first-chance and second-chance exceptions that point to exotic memory addresses.

The solution? I've gone back to manually commenting out code blocks and ultimately it's not impossible to arrange code in a way that gets rid of the exception. However, the logic, which surrounds tracking something like this down still eludes me and I find myself resorting to trial and error, which frankly has a really poor probability of identifying and fixing the actual error that's causing this. After all, I've gone over everything ten times now and most permutations that do get ride of the crash (at least in terms of how I rearrange my code) don't make much sense.

So, to recap - if anyone can point out glaring holes in the way I'm handling exceptions in my code, comment or criticize on the way I'm tracking them down or provide overall suggestions, I'd appreciate it a lot. I realize the problem is likely something as silly as writing past a an array boundary (even though I'm using guarded arrays in debug mode...), but experience has proven that the more innocuous the bug, the more days or weeks it takes to track down.

Oh - in case it becomes relevant, I'm on VS2010, Windows 8.1.

Advertisement
Sounds like typical multithreading bugs to me... do these mysterious situations occur in single-threaded programs using your code?

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

The way that code is formatted in the code window makes me raise an eyebrow and I want to be sure you are doing what you intend to do. The line after the first "if" is indented, but there are {} to the right of the first if...


if(toolActive) 
    { toolActive->Activate(false); } // The part in the {} only gets called if (toolActive)
    
toolActive = newTool; // This line ALWAYS gets called regardless of (toolActive)

if(toolActive) //all cool in the Watch window
    toolActive->Activate(true); //BOOM! crash, because all of a sudden the 'this' pointer is NULL!
                                //EDIT: apparently the 'this' pointer is modified only occasionally; other times newTool's
                                //Activate() starts pointing to unmapped space

And you say the "this" pointer is null. Do you mean toolActive is null? Regardless, I'm thinking it's probably a multi-threaded issue like ApochPiQ suggests where both threads are messing with the same data.

- Eck

EckTech Games - Games and Unity Assets I'm working on
Still Flying - My GameDev journal
The Shilwulf Dynasty - Campaign notes for my Rogue Trader RPG

Ugh. Okay - I actually managed to catch the culprit after almost a day and a half of debugging and it turned out to be something that had worked for me for over a year! Apparently there's a small buffer overrun, which was leaking into the adjacent object in memory and this turned out to be one of those fortuitous moments where I had something to latch on to. Just for reference, in this case the overrun was corrupting the object's vtable, so I spent some time trying to get VS's data breakpoints working on it, but ultimately still had to set up debug outputs all over the code until I could hone in on the offending calls.

I would appreciate suggestions if there are more efficient methods or tools out there that would make catching something like this easier. Getting lucky with identifying the corrupt vtable is one thing, but tracking the bug down really should be easier than this...

@Eck - yeah, that's just hastily written code. The style discrepancy is simple forum negligence in this case :)

Unfortunately memory corruption is incredibly hard to track down, usually because it can take a while for your code to (visibly) break because you might trash unallocated memory, or memory that no one touches until much later.

You mention you have your own memory allocator - have you thought making a "debug" version of it that allocates additional memory around each block? You can fill that memory with sentinel values, and then assert on deallocation if the sentinel values are overwritten (indicating a buffer overrun with that particular block of memory).

You might also want to add a header to each block of memory with some identifying information, such as where it was allocated. That way, if you have memory corruption, you can open up your memory window, look for the header preceding the corrupted memory, and that might hint as to who overran the buffer, or at least point you at the value that did.

The only other advice I can think of is to use heavily tested libraries of code if possible, try to avoid manual iteration of containers, passing buffer sizes to functions, and other things where mistakes can lead to overflows.

If you're using a C++11 compliant compiler, you can use the new ranged-for syntax to avoid off-by-one errors in for loops and the like.


std::vector<int> myArray;
// fill array
// now iterate it:
for (const auto& curValue : myArray)
{
  // do stuff with curValue here
}
If you've got a function that takes a buffer and a buffer size, see if you can use templates to have the compiler calculate the buffer size for you instead of passing it yourself (and potentially making a mistake).


template<size_t bufferSize>
void MyFunction(char* (&buffer)[bufferSize])
{
  // Do stuff with buffer here, confident that bufferSize will be correct
}
You can also use an array size calculating macro when a function wants a buffer size that you can't change. (If your compiler supports constexpr, you can probably template this like the above function as well instead of dealing with the typical macro-related problems)


#define COMPILE_TIME_ARRAY_COUNT(arrayName) (sizeof((arrayName)) / sizeof((arrayName)[0]))
Of course, those only work on variables where the compiler knows it's an array and the array size is known at compile time.

Unfortunately memory corruption is incredibly hard to track down, usually because it can take a while for your code to (visibly) break because you might trash unallocated memory, or memory that no one touches until much later.

You mention you have your own memory allocator - have you thought making a "debug" version of it that allocates additional memory around each block? You can fill that memory with sentinel values, and then assert on deallocation if the sentinel values are overwritten (indicating a buffer overrun with that particular block of memory).

You might also want to add a header to each block of memory with some identifying information, such as where it was allocated. That way, if you have memory corruption, you can open up your memory window, look for the header preceding the corrupted memory, and that might hint as to who overran the buffer, or at least point you at the value that did.

This is a brilliant solution! I assume heap debuggers do something like this. In any case, I'm using function guards anyway, so I already have a run-time trace of my code flow. It wouldn't be difficult at all to introduce a small padding and add forced checks at guarded function exits that perform extended memory diagnostics on demand. This alone should be able to catch any bugs that cause an entire allocation block to overflow, while giving me the offending object and the function in which the overflow happens. This approach wouldn't be fast, but it'd be pretty darn good at tracking memory bugs.

BTW - regarding block headers, I'm keeping each memory manager page in two separate buffers: one that stores aligned objects and a different buffer that stores fixed-size allocation info blocks that reference the data buffer. I find this to be easier to implement/manage and when I was writing my memory manager it seemed people deemed a two-buffer approach more efficient. It's a two-edged sword, though, as each page is directly dependent on average allocation size, which can lead to wasted memory.

The only other advice I can think of is to use heavily tested libraries of code if possible, try to avoid manual iteration of containers, passing buffer sizes to functions, and other things where mistakes can lead to overflows.

To be honest my entire code base is one huge learning project. If I was on a clock and didn't do this purely to have fun and occasionally torture myself for the sake of patting myself on the back for reinventing the wheel, I'd probably have finished something by now. Nevertheless, the truth is I really do code so I can pat myself on the back. I find more solace in the process itself and the knowledge that I wrote something from scratch as opposed to quickly finishing things. I guess that's why it's a hobby.

If you're using a C++11 compliant compiler, you can use the new ranged-for syntax to avoid off-by-one errors in for loops and the like.

Sadly I'm still on VS2010.

To be honest my entire code base is one huge learning project. If I was on a clock and didn't do this purely to have fun and occasionally torture myself for the sake of patting myself on the back for reinventing the wheel, I'd probably have finished something by now. Nevertheless, the truth is I really do code so I can pat myself on the back. I find more solace in the process itself and the knowledge that I wrote something from scratch as opposed to quickly finishing things. I guess that's why it's a hobby.



If you're using a C++11 compliant compiler, you can use the new ranged-for syntax to avoid off-by-one errors in for loops and the like.


Sadly I'm still on VS2010.


Ah your own project is the best learning project. Implement everything yourself! smile.png

If you can, upgrade to 2012 or 2013, both are free in their express versions. 2012 has several useful C++11 features like ranged-for, lambdas, and r-value references, and 2013 builds on that with some C++14 bits as well. Unfortunately neither has full C++11/14 compliance yet, it looks like we'll be waiting for 2014 or later... (Even the 2013 CTP doesn't have everything - and I'm not sure you can install that with express)

Hrm - oddly enough I don't see a 2012 express download. Just 2013 and 2010. I'm assuming you're doing standard windows application development, in which cause you'd want the desktop edition. (For some silly reason "Express for Windows" is for Modern Windows 8 apps, and "Express for Windows Desktop" is for classic-style desktop programs)

Congratulations on figuring out your problem. It's crazy to think that bugs can exist for so long without being stumbled across. When you finally do find and fix those long-time hiding bugs it's a combination of relief and dread. :)

And extra kudos for identifying where you get your enjoyment out of development. Once you realize that, you have even more fun. :)

- Eck

EckTech Games - Games and Unity Assets I'm working on
Still Flying - My GameDev journal
The Shilwulf Dynasty - Campaign notes for my Rogue Trader RPG

There's a fairly simple way to catch heap buffer overruns almost every time on Windows. Use page heap mode. This only helps if you're using the standard memory allocation functions.

However implementing something similar yourself isn't too hard either. You essentially put every allocation on it's own memory page (allocated using VirtualAlloc()), and set the next page to PAGE_NO_ACCESS with VirtualProtect(). You can either put the allocation right at the end of the page to catch overflows, or right at the beginning of the page to catch underflows. This will give you an instant access violation for most heap buffer overflows.

The obvious downside here is that because pages are at least 4KB you'll consume a lot more memory, and even more address space. The worst case is when you do lots of small allocations. You can work round that to some extent by only enabling it for a subset of allocations based on size. You can also build your code as 64-bit so you won't run out of address space, but you will still use much more memory than normal and performance will suffer. This probably means you won't want it enabled all the time.

This topic is closed to new replies.

Advertisement