Release mode crash

Started by
15 comments, last by RobM 8 years, 5 months ago

My game editor is written in c# and uses my native/unmanaged c++ engine DLL via c++/CLI via another managed c++ 'shim' that links in the c++ engine library. The c# editor pulls in a reference to the shim and creates 'wrapper' objects that wrap engine c++ objects.

I'm building my c++ engine DLL in x64 against DirectX9 and my shim and game editor are both also in x64 mode.

When I run the c# editor in debug mode or in release mode from within Visual Studio (2013), everything works fine and as expected. However, when I run the editor exe from explorer, sometimes it works but most of the time (8/10) it crashes. The crash is essentially "Editor has stopped working..." and Windows starts [hopelessly] looking for a solution.

I've removed parts of the code over and over and added logging to try and find where it's crashing but it's proving difficult.

My main question is, does anyone know of a good way to diagnose or see a stack trace from a process that has crashed in this way. I do get a Window asking if I want to debug, but when I do, it looks like the stack is deep in nt.dll somewhere. The debug error that pops up when starting the debug session, however, is:

"Unhandled exception at 0x00000000775FFFC2 (ntdll.dll) in GameEditor.exe: 0xC0000374: A heap has been corrupted (parameters: 0x0000000077677470).

If there is a handler for this exception, the program may be safely continued."

When I run the Release mode version from within Visual Studio, I turn on all exceptions and I don't see any. I have debug info built in with my release build so if there's anything obvious, I'd see it in the stack, but it's all external to my app.

I appreciate this might seem like me describing a haystack and asking you to find the needle but it's more about any tools I can use to diagnose the issue rather than what's causing it.

Advertisement

Memory corruption issues can be really tricky and the smallest change in environment can mask the symptoms without actually making the problem go away. Most commonly the problem that causes a release mode crash that doesn't occur in debug is uninitialized variables (since they can sometimes get initialized by the runtime in debug mode), so I would check for those before anything else. Another thing to note is that, if I recall correctly, when you run your program from Visual Studio, even in release mode your program will use a debug memory allocator under the hood. Without really knowing what your program is doing, it's possible you have a memory stomp or are doing something "naughty" with regards to memory that isn't showing up because the debug allocator is being "helpful."

One thing you can do is attach Visual Studio to your program after it starts up but before it crashes, instead of running it from VS. If your program is crashing on startup, you could put an infinite loop somewhere at the start of your program, attach the debugger, then use the debugger to move the instruction pointer out of that loop so the program can run normally. Since the symptom of the problem is apparently heap corruption, another thing you could try is putting HeapValidate calls in strategic places and seeing where they fail. That would give you an idea of when the problem happens and therefore some insight into how it might be happening.

Memory corruption issues can be really tricky and the smallest change in environment can mask the symptoms without actually making the problem go away. Most commonly the problem that causes a release mode crash that doesn't occur in debug is uninitialized variables (since they can sometimes get initialized by the runtime in debug mode), so I would check for those before anything else. Another thing to note is that, if I recall correctly, when you run your program from Visual Studio, even in release mode your program will use a debug memory allocator under the hood. Without really knowing what your program is doing, it's possible you have a memory stomp or are doing something "naughty" with regards to memory that isn't showing up because the debug allocator is being "helpful."

One thing you can do is attach Visual Studio to your program after it starts up but before it crashes, instead of running it from VS. If your program is crashing on startup, you could put an infinite loop somewhere at the start of your program, attach the debugger, then use the debugger to move the instruction pointer out of that loop so the program can run normally. Since the symptom of the problem is apparently heap corruption, another thing you could try is putting HeapValidate calls in strategic places and seeing where they fail. That would give you an idea of when the problem happens and therefore some insight into how it might be happening.

Thanks for this, some really useful info to try. The app crashes very quickly (when it does crash). If it doesn't crash, then it just runs through normally.

I'll try putting a loop in near the start and see what happens.

Thanks again

For hunting issues generally in release mode, you can add a crash handler that generates minidump files. I'm on mobile so you will need to search on Google for more.

You can open the dump in visual studio, point it to your .pdb and your source tree, and it will show you system state where it crashed. With memory problems it will often show the variables and code involved.
After a whole lunchless day of debugging which felt like 10 minutes but was actually more like 10 hours, I think it might be down to a threading/lock issue.

I eventually managed to getsome readable debug info and it appears that the issue happens mostly at startup when I'm creating effects. I can't remember the exact debug line but it looked like part of a critical section.

My rendering in the game editor is done in a separate thread which is spawned when I create my main scene editor object. I read during some research that the d39 device must be created on the same thread that handles windows message but it is fine to launch another thread that accesses the device. I believe I'm doing this but the thread that actually creates the device comes into the native library from c#. I'm wondering whether I've not setup with windows message handler properly and they're being handled on a different thread - after all, the thread in c# that starts up and eventually (via the shim) starts the device, won't be the same one that the c# windows messages come in on.

Need to do a bit more research I think. I use smart pointers nearly everywhere and I generally don't have many buffers that could overrun. With what I saw in the debugger being related to a critical section lock I can only assume it's something to do with that.

Fixed this now. I had to strip out almost all of the code in objects that were being used during start up and that has solved some issues, but ultimately I think this issue was caused by the fact that I was sorting my native object in the shim using a static pointer. Changing that to store the native pointer within the wrapper class as a non-static member seems to have fixed the crash at start up.


you could put an infinite loop somewhere

This is a bit of an aside, but you can check for a debugger being attached in both C# and C++.

C#: System.Diagnostics.Debugger.IsAttached

C++: IsDebuggerPresent

(With the quick note that the C# one will not work with WinDBG)

Then you can do a while(!IsAttached()) { Task.delay(1000).wait(); } or something of that sort instead of an infinite loop. It's slightly more code, but slightly nicer to work with.


you could put an infinite loop somewhere

This is a bit of an aside, but you can check for a debugger being attached in both C# and C++.

C#: System.Diagnostics.Debugger.IsAttached

C++: IsDebuggerPresent

(With the quick note that the C# one will not work with WinDBG)

Then you can do a while(!IsAttached()) { Task.delay(1000).wait(); } or something of that sort instead of an infinite loop. It's slightly more code, but slightly nicer to work with.

Thanks, Ferrous, that was useful.

As a further update to this, I thought I had fixed it but I hadn't. The first thing I did after reading the replies was to check my uninitialized members. I got partway through and realised that I hardly ever create data on the heap, it's either in a smart pointer or on the stack. There were a few pointer members in my graphics class that I ignored because I knew I didn't think I wasn't accessing them. After a further day of debugging, commenting out huge amounts of code and almost feeling like giving up I went back to the uninitialized members thought.

Setting them all to NULL, incredibly, solved my issue completely. There were 4 of them and 2 of them were definitely used and initialised by running code (not on class creation). I'm truly elated at the fix but what I'm now left wondering is how, in release mode, can a pointer that is not used by the code path at all (but is obviously initially pointing to garbage) cause it to crash?


Setting them all to NULL, incredibly, solved my issue completely. There were 4 of them and 2 of them were definitely used and initialised by running code (not on class creation). I'm truly elated at the fix but what I'm now left wondering is how, in release mode, can a pointer that is not used by the code path at all (but is obviously initially pointing to garbage) cause it to crash?

It generally can't.

Either you haven't really fixed your bug, or you do indeed have code somewhere that uses these uninitialized values.

It's possible you may not have *source code* explicitly looking at these values, but you might have a bug where some code was looking at the wrong memory locations, which happened to be those uninitialized member variables.

Setting them all to NULL, incredibly, solved my issue completely. There were 4 of them and 2 of them were definitely used and initialised by running code (not on class creation). I'm truly elated at the fix but what I'm now left wondering is how, in release mode, can a pointer that is not used by the code path at all (but is obviously initially pointing to garbage) cause it to crash?


It generally can't.

Either you haven't really fixed your bug, or you do indeed have code somewhere that uses these uninitialized values.

It's possible you may not have *source code* explicitly looking at these values, but you might have a bug where some code was looking at the wrong memory locations, which happened to be those uninitialized member variables.

It's definitely fixed. Did a 100 run test with no crashes. I set the pointers back to uninitialised and it crashed straight away. I will have a look to see if they're being used, it's possible they are. Got so much to do, once I'd fixed it I moved on to adding more functionality. Added a fair bit today and did some refactoring and still no crashes.

I was wondering whether internally, the OS moves the heap around and re-houses pointers - if they're uninitialised this would obviously be bad. I've never seen heap pointer addresses change through a debug session though, unless it's done with some kind of internal mapping. Not sure, just glad it's fixed so I can move on!

This topic is closed to new replies.

Advertisement