StackWalk64 and x86
Hi all,
I dicovered today that the reason my debug builds were so slow is that my memory manager obtains a stack trace for all allocations at the time the allocation is made, and not if it leaks or not. So, to speed things up, I've been trying to modify my code to just store the current CONTEXT when an allocation is made, and then to walk the stack when reporting leaks.
However, StackWalk64() seems to be walking the stack as it is at the time of the StackWalk64() call, not at the time the CONTEXT was captured.
According to the Documentation, the CONTEXT parameter is not required on x86, which leads me to think that on x86 it's ignored and it'll always get the current context and then stack walk that, which is a problem for me.
I can't walk the stack at allocation time (to get a stack trace), because that involves looking up debug symbols, which is the slow part, and I can't just grab the top of the stack and dump that, because it'll always be inside my memory manager, making the output pretty useless.
I have an x64 build of my app, but I'm unable to test it just now (No 64-bit machine to test it on), I'll give it a go tomorrow and see if the problem exists in a x64 build (Which I doubt).
Does anyone know if this is the case, and StackWalk64() grabs the current context in x86? And is there any way around this?
Cheers,
Steve
In a nutshell, a CONTEXT only captures the state of the CPU, which means it contains only minimal information about the stack. Unless you use the CONTEXT to perform a stack walk then and there, it becomes useless since the state of the stack will change if any functions are called are returned from.
I've never done this, but in theory you could perform a stack walk and only store the program counter for each stack frame. This doesn't require looking up the debug symbol information, and the addresses should still be good to obtain the relevant symbol information later.
I've never done this, but in theory you could perform a stack walk and only store the program counter for each stack frame. This doesn't require looking up the debug symbol information, and the addresses should still be good to obtain the relevant symbol information later.
Why not just pass the function and line number of the caller into the allocation routine? You can easily set up a macro to do it using __LINE__ and __FUNCTION__.
Doing a stack trace seems a rather complicated way to do things.
Doing a stack trace seems a rather complicated way to do things.
The slow bit tends to be resolving the addresses into function names. As long as you aren't doing that on the call to new it should be relatively quick.
If you need some example code just look at VLD.
If you need some example code just look at VLD.
Quote:According to the Documentation, the CONTEXT parameter is not required on x86, which leads me to think that on x86 it's ignored and it'll always get the current context and then stack walk that, which is a problem for me.
As a side note, it's definitely not always ignored on ia32, as I've seen results differ according to whether or not registers are correctly set.
Quote:I can't just grab the top of the stack and dump that, because it'll always be inside my memory manager, making the output pretty useless.
Since you know the number of frames between your StackWalk and the calling function, you can just skip that amount.
Quote:I've never done this, but in theory you could perform a stack walk and only store the program counter for each stack frame. This doesn't require looking up the debug symbol information, and the addresses should still be good to obtain the relevant symbol information later.
Yep, that works well :)
Quote:Why not just pass the function and line number of the caller into the allocation routine? You can easily set up a macro to do it using __LINE__ and __FUNCTION__.
That's fine until you get sick and tired of wrapping each instance of placement new in ugly #include "nommgr.h" / #include "mmgr.h". It also requires more work if you need std::nothrow_t.
Thanks for the replies. I've tried the code on x64, and it doesn't work, in the same way as x86 (Although after reading SiCrane's reply, thank makes sense.
I'll try doing the stack walk and storing the top 10 frames or something (Well, just EIP / the PC).
In my old code, I walked the stack, resolving function names until I hit a function name that didn't start with "PMemory::" or "operator new", and then assumed that was the calling function; which worked fine. I'll try walking the stack and storing (up to) the top 10 frames or so, and then resolve them at leak time,and let you know how that works.
Cheers,
Steve
Quote:Original post by SiCraneAh, good point; I thought that it'd capture the whole stack, but I suppose that would be overkill...
In a nutshell, a CONTEXT only captures the state of the CPU, which means it contains only minimal information about the stack. Unless you use the CONTEXT to perform a stack walk then and there, it becomes useless since the state of the stack will change if any functions are called are returned from.
I've never done this, but in theory you could perform a stack walk and only store the program counter for each stack frame. This doesn't require looking up the debug symbol information, and the addresses should still be good to obtain the relevant symbol information later.
I'll try doing the stack walk and storing the top 10 frames or something (Well, just EIP / the PC).
Quote:Original post by Jan WassenbergI use my memory manager in a release build sometimes too (Well, release build + debug symbols), so the number of functions from the original caller varies, due to inlining.Quote:I can't just grab the top of the stack and dump that, because it'll always be inside my memory manager, making the output pretty useless.
Since you know the number of frames between your StackWalk and the calling function, you can just skip that amount.
Quote:Original post by Jan WassenbergYup. I used to use mmgr, and had all sorts of issues like this. I like the ability to just drop a header and source file into my app and have complete memory manager functionality.Quote:Why not just pass the function and line number of the caller into the allocation routine? You can easily set up a macro to do it using __LINE__ and __FUNCTION__.
That's fine until you get sick and tired of wrapping each instance of placement new in ugly #include "nommgr.h" / #include "mmgr.h". It also requires more work if you need std::nothrow_t.
In my old code, I walked the stack, resolving function names until I hit a function name that didn't start with "PMemory::" or "operator new", and then assumed that was the calling function; which worked fine. I'll try walking the stack and storing (up to) the top 10 frames or so, and then resolve them at leak time,and let you know how that works.
Cheers,
Steve
When I had this same problem I just stored the offset then resolved the names later. This was actually really fast. I used a DEBUG_WALK_DEPTH macro and a DEBUG_WALK_SKIP macro to define how deep to store and how many to skip before storing the stack. In my program its DEBUG_WALK_DEPTH = 10 and DEBUG_WALK_SKIP = 2. I only created support for x86 tracing but it would not be hard to add 64 bit support. Maybe this can help:
Using stackFrame.AddrPC.Offset you can later resolve the symbols during output.
void CCallStack::WalkCallStack ( DebugDataStruct *debugStruct ){ if (debugStruct == NULL) return; CONTEXT context; // Grap the current context (state of EBP,EIP,ESP registers) memset(&context, 0, sizeof(CONTEXT)); context.ContextFlags = CONTEXT_ALL; _asm { call x x: pop eax mov context.Eip, eax mov context.Ebp, ebp mov context.Esp, esp } //RtlCaptureContext(&context); STACKFRAME64 stackFrame; memset(&stackFrame, 0, sizeof(STACKFRAME64)); // Stack frame must be set based on arcitecture#ifdef _M_IX86 stackFrame.AddrPC.Offset = context.Eip; stackFrame.AddrPC.Mode = AddrModeFlat; stackFrame.AddrFrame.Offset = context.Ebp; stackFrame.AddrFrame.Mode = AddrModeFlat; stackFrame.AddrStack.Offset = context.Esp; stackFrame.AddrStack.Mode = AddrModeFlat;#else #error "Platform not supported!"#endif debugStruct->stackCount = 0; HANDLE hThread = GetCurrentThread(); for (int frameNum = 0; frameNum < (DEBUG_WALK_DEPTH + DEBUG_WALK_SKIP); ++frameNum ) { if (!StackWalk64(IMAGE_FILE_MACHINE_I386,m_hProcess,hThread,&stackFrame,&context,CCallStack::ReadMemoryRoutine,SymFunctionTableAccess64,SymGetModuleBase64,NULL)) break; if (stackFrame.AddrPC.Offset == stackFrame.AddrReturn.Offset) break; // Valid call stack frame if (stackFrame.AddrPC.Offset != 0) { if (frameNum >= DEBUG_WALK_SKIP) { debugStruct->stackOffset[debugStruct->stackCount] = stackFrame.AddrPC.Offset; debugStruct->stackCount++; } } else break; } }}
Using stackFrame.AddrPC.Offset you can later resolve the symbols during output.
Quote:Original post by Evil SteveThat seems to be what DevPartner's Error Checking does. Though it's a configurable number of frames. That manages to not slow it down much, so it should work well for you.Quote:I've never done this, but in theory you could perform a stack walk and only store the program counter for each stack frame. This doesn't require looking up the debug symbol information, and the addresses should still be good to obtain the relevant symbol information later.Ah, good point; I thought that it'd capture the whole stack, but I suppose that would be overkill...
I'll try doing the stack walk and storing the top 10 frames or something (Well, just EIP / the PC).
Well, just saving the program counter seems to work great. Loading a BSP file (Which makes about 6000 allocations, mostly STL ones) originally tool > 30 seconds, and now it takes about 2 seconds with the code to stack walk byt not resolve functions. Without the stack walking at all, it takes about half a second.
If anyone is interested in the code I have:
And then the main code, RecordStackTrace is called for every allocation, and GetCallerForAllocation is called when memory leaks are detected:
I use RtlCaptureContext, which I know doesn't work on pre-XP because I didn't want to mess around with SEH, or x64 assembly (Which needs to be in a seperate asm file, ugh).
Thanks again,
Steve
If anyone is interested in the code I have:
// Allocation struct (Irrelevant fields removed)struct AllocHeader{ #ifdef USE_STACKTRACE static const size_t cnMaxStackFrames = 16; size_t nPC[cnMaxStackFrames]; #endif};// Headers and libs:#ifdef USE_STACKTRACE #include <dbghelp.h> #pragma comment(lib,"dbghelp.lib")#endif // USE_STACKTRACE// Memory manager init time (From constructor):#ifdef USE_STACKTRACE SymInitialize(GetCurrentProcess(), NULL, TRUE);#endif
And then the main code, RecordStackTrace is called for every allocation, and GetCallerForAllocation is called when memory leaks are detected:
void PMemory::RecordStackTrace(AllocHeader* pAllocation){#ifdef USE_STACKTRACE // Capture context CONTEXT ctx; RtlCaptureContext(&ctx); // Init the stack frame for this function STACKFRAME64 theStackFrame; memset(&theStackFrame, 0, sizeof(theStackFrame)); #ifdef _M_IX86 DWORD dwMachineType = IMAGE_FILE_MACHINE_I386; theStackFrame.AddrPC.Offset = ctx.Eip; theStackFrame.AddrPC.Mode = AddrModeFlat; theStackFrame.AddrFrame.Offset = ctx.Ebp; theStackFrame.AddrFrame.Mode = AddrModeFlat; theStackFrame.AddrStack.Offset = ctx.Esp; theStackFrame.AddrStack.Mode = AddrModeFlat; #elif _M_X64 DWORD dwMachineType = IMAGE_FILE_MACHINE_AMD64; theStackFrame.AddrPC.Offset = ctx.Rip; theStackFrame.AddrPC.Mode = AddrModeFlat; theStackFrame.AddrFrame.Offset = ctx.Rsp; theStackFrame.AddrFrame.Mode = AddrModeFlat; theStackFrame.AddrStack.Offset = ctx.Rsp; theStackFrame.AddrStack.Mode = AddrModeFlat; #elif _M_IA64 DWORD dwMachineType = IMAGE_FILE_MACHINE_IA64; theStackFrame.AddrPC.Offset = ctx.StIIP; theStackFrame.AddrPC.Mode = AddrModeFlat; theStackFrame.AddrFrame.Offset = ctx.IntSp; theStackFrame.AddrFrame.Mode = AddrModeFlat; theStackFrame.AddrBStore.Offset = ctx.RsBSP; theStackFrame.AddrBStore.Mode = AddrModeFlat; theStackFrame.AddrStack.Offset = ctx.IntSp; theStackFrame.AddrStack.Mode = AddrModeFlat; #else # error "Platform not supported!" #endif // Walk up the stack memset(pAllocation->nPC, 0, sizeof(pAllocation->nPC)); for(int i=0; i<AllocHeader::cnMaxStackFrames; ++i) { pAllocation->nPC = theStackFrame.AddrPC.Offset; if(!StackWalk64(dwMachineType, GetCurrentProcess(), GetCurrentThread(), &theStackFrame, &ctx, NULL, SymFunctionTableAccess64, SymGetModuleBase64, NULL)) { break; } }#endif UNREFERENCED_PARAMETER(pAllocation);}const char* PMemory::GetCallerForAllocation(AllocHeader* pAllocation){#ifdef USE_STACKTRACE const size_t cnBufferSize = 512; char szFile[cnBufferSize]; char szFunc[cnBufferSize]; unsigned int nLine; static char szBuff[cnBufferSize*3]; // Initialise allocation source strcpy(szFile, "??"); nLine = 0; // Resolve PC to function names size_t nPC; for(int i=0; i<AllocHeader::cnMaxStackFrames; ++i) { // Check for end of stack walk nPC = pAllocation->nPC; if(nPC == 0) break; // Get function name unsigned char byBuffer[sizeof(IMAGEHLP_SYMBOL64) + cnBufferSize]; IMAGEHLP_SYMBOL64* pSymbol = (IMAGEHLP_SYMBOL64*)byBuffer; DWORD64 dwDisplacement; memset(pSymbol, 0, sizeof(IMAGEHLP_SYMBOL64) + cnBufferSize); pSymbol->SizeOfStruct = sizeof(IMAGEHLP_SYMBOL64); pSymbol->MaxNameLength = cnBufferSize; if(!SymGetSymFromAddr64(GetCurrentProcess(), nPC, &dwDisplacement, pSymbol)) strcpy(szFunc, "??"); else { pSymbol->Name[cnBufferSize-1] = '\0'; // See if we need to go further up the stack if(strncmp(pSymbol->Name, "PMemory::", 9) == 0) { // In PMemory, keep going... } else if(strncmp(pSymbol->Name, "operator new", 12) == 0) { // In operator new or new[], keep going... } else if(strncmp(pSymbol->Name, "std::", 5) == 0) { // In STL code, keep going... } else { // Found the allocator (Or near to it) strcpy(szFunc, pSymbol->Name); break; } } } // Get file/line number if(nPC != 0) { IMAGEHLP_LINE64 theLine; DWORD dwDisplacement; memset(&theLine, 0, sizeof(theLine)); theLine.SizeOfStruct = sizeof(theLine); if(!SymGetLineFromAddr64(GetCurrentProcess(), nPC, &dwDisplacement, &theLine)) { strcpy(szFile, "??"); nLine = 0; } else { const char* pszFile = strrchr(theLine.FileName, '\\'); if(!pszFile) pszFile = theLine.FileName; else ++pszFile; strncpy(szFile, pszFile, cnBufferSize); nLine = theLine.LineNumber; } } // Format into buffer and return sprintf(szBuff, "%s:%d (%s)", szFile, nLine, szFunc); return szBuff;#else UNREFERENCED_PARAMETER(pAllocation); return "Stack trace unavailable";#endif // USE_STACKTRACE}
Thanks again,
Steve
Quote:I use my memory manager in a release build sometimes too (Well, release build + debug symbols), so the number of functions from the original caller varies, due to inlining.
Yes, but the number is either under your control (-> __declspec(noinline) or inline) or known to you (simply count them in debug/release for each compiler you support).
This is kind of hacky, but no less hacky than a max. number of stack frames (exact same problem, just now up for the library user to handle)
Quote:Yup. I used to use mmgr, and had all sorts of issues like this. I like the ability to just drop a header and source file into my app and have complete memory manager functionality.
Indeed. I am currently evaluating VLD, which appears very nice but causes an evil race condition with our scripting engine's /highest-priority/ GC thread *sigh*
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement