StackWalk is too slow

Started by
4 comments, last by Jan Wassenberg 16 years, 2 months ago
I have the following code:

bool walkStack(CONTEXT& context, bool toInit = false)
{
  static STACKFRAME frame;
  if(toInit)
  {
    ZeroMemory(&frame, sizeof(STACKFRAME));
    frame.AddrPC.Offset = context.Eip;
    frame.AddrStack.Offset = context.Esp;
    frame.AddrFrame.Offset = context.Ebp;
    frame.AddrFrame.Mode = frame.AddrStack.Mode = frame.AddrPC.Mode = AddrModeFlat;
  }

  if( !StackWalk(IMAGE_FILE_MACHINE_I386, GetCurrentProcess(), GetCurrentThread(),
                 &frame, &context, NULL, SymFunctionTableAccess, SymGetModuleBase, NULL) ||
      !frame.AddrFrame.Offset )
    return false;

  return true;
}

void getCallStack()
{
  CONTEXT context = {0};
  context.ContextFlags = CONTEXT_FULL;
  __asm
  {
     call x
  x: pop eax
     mov context.Eip, eax
     mov context.Ebp, ebp
     mov context.Esp, esp
  }

  bool ret = walkStack(context, true);
  for( ; ret; )
    ret = walkStack(context);
}


It's used for storing of call-stack on memory allocation requests (the code here does not contain data storing itself). It works correctly but VERY slow. I tried 64-bit variants of functions/structures as well, but it didn't improve the performance at all. The question of performance of stackwalking was discussed in this thread, but it was just about resolving stack frame addresses with source code symbols. What was wonderful from that thread for me - Visual Leak Detector. It works very fast! But I didn't find any significant differences between the two implementations of stackwalking.. Environment: Windows Vista, Visual Studio 2005. Any ideas are welcome!
Advertisement
See Here.

I StackWalk() for 16 frames at allocation time, and then just lookup the symbols when I dump the leaks. I haven't seen any noticible performance problems with that code, but I'm only using it in debug, not release.
There are about 3.5 million requests for memory allocation during just start and close of my app. And performance degrades approx. proportionally to the max stack depth I walk :(
Walking the stack is only for debugging, so it should not matter if it is slow. In the final product you will remove all of that since it isn't needed any more.

But...

Quote:Original post by diamant
There are about 3.5 million requests for memory allocation during just start and close of my app.


... how on earth?
Quote:Original post by diamant
There are about 3.5 million requests for memory allocation during just start and close of my app. And performance degrades approx. proportionally to the max stack depth I walk :(
3.5 million allocations!? What on earth are you doing?
StackWalk isn't the fastest function out there, but it's not that slow either. My engine takes approximately 3 seconds to start up (And makes around 6000 allocations to load a Quake I .bsp file with textures, shaders, etc etc) with my memory manager turned on, around 1.5 second with my memory manager turned off, and around 2 seconds with the memory manager turned on, but the stack walk code turned off.

However, as Zahlman said, it's a debug build, performance shouldn't be that much of an issue.
In this case it may be advisable to walk the stack manually when running on IA-32 with frame pointers enabled. I vaguely recall VLD doing this as well.
All that needs to be done is grab the stack frame pointed to by EBP, read the return address, and 'disassemble' the code immediately before it. The latter reduces to divining whether the instruction bytes are one of the 9 forms of CALL.

> ... how on earth?
> 3.5 million allocations!? What on earth are you doing?
heh, never measured the number of allocations in large C++ apps? XML parsing and inefficient string processing are particularly swell for reaching large allocation counts.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement