Debugging memory overwrites

Started by
7 comments, last by JohnBolton 17 years, 8 months ago
It seems that the more experience I get with C++, the more I realise how vital it is to have a good array of tricks you can use with the debugger (in my case, Visual Studio 2003) to track down common types of errors. Here is one class of problem that I'm currently stuck on that I still don't really know how to easily debug and solve: I have an object (for argument's sake, let's call it CGame::m_pSingleton), that has as one of its members a pointer to one of my CMatrix4x4 objects, called m_pTransform. I initialise m_pTransform nicely with the identity matrix, and I add a watch to my debugger, and I'm stepping through my program nicely and the first entry is staying happily at 1.0f. Then comes the problem: I step over a line of code that is totally unrelated to the CGame::m_pSingleton object (it's to do with looping over elements in an XML document using TinyXML, in this case) and suddenly my watch shows that the first entry of m_pTransform has been overwritten with 0xCCCCCCCC (MSVC's code for an uninitialised local variable). This is obviously not good! I've had this problem once before, and it turned out it was because somewhere along the line I had a header file somewhere declaring a class name with the form "class CMyClass;" so that I could declare other classes that used it, but in one important place I hadn't later included the full definition of CMyClass so the memory layout was considered to be different in different compilation units. In this current case, however, I can't easily see where the problem is, and I can't really think how to debug and solve my problem! Any thoughts? Obviously I would like to fix my specific case, but I'm also particularly interested in how to go about debugging problems like this in the general case. Ideally, I imagine the compiler itself ought to have caught the problem last time, but maybe it's not possible for it to know whether I really intended my class to have no members or methods.
Advertisement
Did you try an on change conditional breakpoint?
Yep, that's how I found out where the overwrite happens. But since TinyXML is totally unrelated to CGame::m_pSingleton, it still doesn't give me much of a clue as to how to fix the problem.
If you remove the code you believe to be offending, temporarily, does that seem to solve the overwrite?
Quote:Original post by Ro_Akira
If you remove the code you believe to be offending, temporarily, does that seem to solve the overwrite?


Good thinking...

Hmm, the answer is slightly confusing... There is a particular function call that seems to mess up the memory (If you know anything about TinyXML, it's calling the IterateChildren method on a TiXmlElement object). If I comment that out, then the memory stays intact a little longer than before, but now it gets messed up when the execution exits the function that calls IterateChildren

bool CDatafile::ImportLoc(...){	TiXmlElement* pRoot = ...;	TiXmlNode* pMesh = pRoot->IterateChildren("mesh", NULL);	while (pMesh)	{		... // It's here that the m_pTransform memory gets set up correctly in the first place		// Iterate to next mesh element		pMesh = pRoot->IterateChildren("mesh", pMesh);	};	...	return true;}


Normally the memory gets messed upon entering one of the functions that the IterateChildren call makes (I have to step in to it to see it happening). If I comment out that line and make it exit the 'while' loop straight away then the memory seems intact during the '...' region, but then gets messed up upon exiting the ImportLoc method.

I hate debugging this kind of problem - it just always seems so random!
Hmm, in fact, keeping the IterateChildren call commented out, but adding in any other random function call also seems to corrupt the memory (e.g. if I add a call to my Log function), so I guess that implies that something is already screwed up by that point. Any thoughts?
Right, I think I've found the problem:

The memory that m_pTransform is pointing to (and that's getting corrupted) isn't actually the memory that I thought it was. I was setting m_pTransform during the constructor of an object to point to a member of the same object, but that object was on the stack not the heap, and it was a copy of the object that was being kept (which obviously had a different memory address to what m_pTransform was pointing to)

So, again the questions is, any thoughts on how to track down and identify bugs like this more quickly in the future?
Well, the steps you took here are pretty much what I do. When you removed the part that seemed to be offending, you found out the problem was still occuring.

Finding out that the key moment the problem occured was when the function went out of scope is an indicator of either a stack related problem, or a destructor related problem.

Just watch out for pointers to stack objects. The compiler can help out sometimes with warnings, but other times it can't. Debugging these things can often be just plain difficult. You can have a pointer to an object that's long gone from the stack, and the first indication of a problem is weird behaviour at best.

I think the 0xCCCCCCCC code was actually a bit of a clue against an overwrite problem. Overwrites usually manifest themselves as junk or zeros magically appearing.
Since 0xcccccccc is a marker indicating freshly allocated and uninitialized memory (I assume), it is likely that the pointer to the matrix is pointing to memory that has been deallocated, either because it was pointing to a variable in a function, or it is pointing to memory that was deleted. You aren't doing something like this are you?
    CMatrix4x4 * IdentityMatrix()    {        CMatrix4x4 matrix = { ... };        return &matrix    } 


Edit: Oh, I see you found the problem. The most effective way of discovering the source of the problem is to set a "break on write" breakpoint as SiCrane suggested, then figure out the source of the conflict.
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!

This topic is closed to new replies.

Advertisement