Jump to content
  • Advertisement
Sign in to follow this  
diamant

Unity StackWalk is too slow

This topic is 3894 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have the following code:
bool walkStack(CONTEXT& context, bool toInit = false)
{
  static STACKFRAME frame;
  if(toInit)
  {
    ZeroMemory(&frame, sizeof(STACKFRAME));
    frame.AddrPC.Offset = context.Eip;
    frame.AddrStack.Offset = context.Esp;
    frame.AddrFrame.Offset = context.Ebp;
    frame.AddrFrame.Mode = frame.AddrStack.Mode = frame.AddrPC.Mode = AddrModeFlat;
  }

  if( !StackWalk(IMAGE_FILE_MACHINE_I386, GetCurrentProcess(), GetCurrentThread(),
                 &frame, &context, NULL, SymFunctionTableAccess, SymGetModuleBase, NULL) ||
      !frame.AddrFrame.Offset )
    return false;

  return true;
}

void getCallStack()
{
  CONTEXT context = {0};
  context.ContextFlags = CONTEXT_FULL;
  __asm
  {
     call x
  x: pop eax
     mov context.Eip, eax
     mov context.Ebp, ebp
     mov context.Esp, esp
  }

  bool ret = walkStack(context, true);
  for( ; ret; )
    ret = walkStack(context);
}


It's used for storing of call-stack on memory allocation requests (the code here does not contain data storing itself). It works correctly but VERY slow. I tried 64-bit variants of functions/structures as well, but it didn't improve the performance at all. The question of performance of stackwalking was discussed in this thread, but it was just about resolving stack frame addresses with source code symbols. What was wonderful from that thread for me - Visual Leak Detector. It works very fast! But I didn't find any significant differences between the two implementations of stackwalking.. Environment: Windows Vista, Visual Studio 2005. Any ideas are welcome!

Share this post


Link to post
Share on other sites
Advertisement
See Here.

I StackWalk() for 16 frames at allocation time, and then just lookup the symbols when I dump the leaks. I haven't seen any noticible performance problems with that code, but I'm only using it in debug, not release.

Share this post


Link to post
Share on other sites
There are about 3.5 million requests for memory allocation during just start and close of my app. And performance degrades approx. proportionally to the max stack depth I walk :(

Share this post


Link to post
Share on other sites
Walking the stack is only for debugging, so it should not matter if it is slow. In the final product you will remove all of that since it isn't needed any more.

But...

Quote:
Original post by diamant
There are about 3.5 million requests for memory allocation during just start and close of my app.


... how on earth?

Share this post


Link to post
Share on other sites
Quote:
Original post by diamant
There are about 3.5 million requests for memory allocation during just start and close of my app. And performance degrades approx. proportionally to the max stack depth I walk :(
3.5 million allocations!? What on earth are you doing?
StackWalk isn't the fastest function out there, but it's not that slow either. My engine takes approximately 3 seconds to start up (And makes around 6000 allocations to load a Quake I .bsp file with textures, shaders, etc etc) with my memory manager turned on, around 1.5 second with my memory manager turned off, and around 2 seconds with the memory manager turned on, but the stack walk code turned off.

However, as Zahlman said, it's a debug build, performance shouldn't be that much of an issue.

Share this post


Link to post
Share on other sites
In this case it may be advisable to walk the stack manually when running on IA-32 with frame pointers enabled. I vaguely recall VLD doing this as well.
All that needs to be done is grab the stack frame pointed to by EBP, read the return address, and 'disassemble' the code immediately before it. The latter reduces to divining whether the instruction bytes are one of the 9 forms of CALL.

> ... how on earth?
> 3.5 million allocations!? What on earth are you doing?
heh, never measured the number of allocations in large C++ apps? XML parsing and inefficient string processing are particularly swell for reaching large allocation counts.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!