Multi thread deadlock issue with a recursive mutex. Need ideas.

Started by
20 comments, last by Pink Horror 8 years ago

An update on this topic, our bugs were found and fixed. However this thread was me looking in the wrong direction. The deadlock happening was a side-effect of the real bug and not the source of my issues. The bug was coming from a system was acting erratically and sending way too many messages over the network and our code could not keep up and they would stack up. Overtime, having a ton of very small memory blocks for each message would fragment the memory and then various systems would fail. The deadlock was the most common effect but we also had thread initialization failing and sometimes a straight up memory allocation failure (malloc of a big chunk returns NULL and that pointer was then used).

So, you have memory allocation failures, and most of them do not crash your program immediately? And then you're stuck dealing with other bugs that look impossible? Let me guess, you have catch (exception) or, even worse, catch (...) everywhere, with maybe a log saying "unknown exception" if your programmers are slightly less lazy than the people who just leave the catch empty?

malloc does not throw exceptions. It simply returns NULL. On some systems, there is no built-in catch for NULL dereferences, and NULL+offsetof(SomeStruct, someMember) might reasonably point to memory used by the main thread's stack, some global variable, allocation records, or even the OS itself; any of which could have very unpredictable consequences. It's entirely possible no C++ exception was ever thrown and no OS exception/signal/etc was ever triggered, and memory was silently corrupted.

Sure, that's possible, but I still think that it's relatively unlikely to have memory corruption de-referencing null pointers from failed malloc calls, instead of segmentation faults, compared to the chance this code is throwing and catching bad_alloc exceptions. The code above is clearly C++. I would guess new is being used, even with malloc mentioned earlier.

I've never worked on a program that corrupted memory through an offset null pointer. I have worked on code where memory usage would spike up and cause allocation failures, because it was filled with try/catch statements.

It's pretty common for C++ projects to interface with C libraries, a great many of which perform internal allocation and deallocation, some of which might not check for failed allocation before access. It's not unheard of for C++ programmers to use malloc for buffers, this is actually my own practice. The new operator can be overloaded, an overloaded new might not throw std::bad_alloc. It is possible to disable exceptions in C++. It is possible that this is a non-conforming C++ implementation with no exceptions - some C++ implementations intended for embedded applications lack RTTI and exceptions, along with a large number of other C++ features; possibly some lack the new operator altogether, and malloc is the only way to allocate. The implementation of malloc might be non-conforming, or a custom allocator might be used. There's any number of reasons why an allocation might not throw which are perfectly reasonable. I understand that sloppy exception handling is all-too common, but not handling exceptions at all is even more common (I'm guilty of this), so it just seems odd to me to assume that's why the source of this bug was never caught. Also, I need to point out that on systems with NULL pointer protection, dereferencing memory near 0 does not generate a C++ exception, it generates a fatal signal - which few people know how to recover from - or an SEH exception on Windows, the typical handling of which is to quietly close the program. That is the first thing that indicated to me that something else was the problem.

Advertisement
I was not assuming anything. I was guessing. It was a question.

This topic is closed to new replies.

Advertisement