Is it right to increase stack size for better performance

Started by
8 comments, last by PolarWolf 6 years, 6 months ago

We know stack is faster than heap,so if we increase stack size, and move some calculations into stack from heap,will the performance be better? I googled but find no clear answer to what's the max stack size limitation on windows, I set my program's stack size to 50M,all are ok,but when set to 100M,my program runs slowly,and acted weird, such as GetOpenFileNameA has no effect, no open dialog was opened and no error occured. My question is,is it right or good to increase stack size for better performance,what stack size should be set for window programs?

Advertisement
1 hour ago, PolarWolf said:

We know stack is faster than heap

Allocating memory on the stack is faster than allocating memory on the heap... because it's a much simpler operation, and the downside of that is that... well, it's a stack. When you exit the current scope (return from a function), all the stuff you created gets popped off the stack.

To manage more long term allocations you need a different allocation strategy which is pretty much required to be more expensive, because it's going to be more complicated.

Normally when you run out of stack space, it's called a "stack overflow" and it crashes your program... so increasing stack size will only do something if you're already crashing due to stack overflow... and if you're not already crashing, then it will do nothing at all.

FWIW though, on many of the older console games that I've worked on, when implementing our own memory allocators, we used a stack-based mark-and-release algorithm. It uses a stack of bytes, but not "the stack", and is still as fast as allocating on "the stack"... but that's a super advanced topic given the context...

The heap is not that slow. Malloc is not that slow. You can do thousands of mallocs per frame and not care. Sure, allocating a thousand integers on the stack is free -- it literally results in no instructions at all, because it's not doing anything -- so allocating integers from the heap is infinitely more expensive! Infinity is a lot!!! But it still doesn't matter. 

Profile your code. Time how long things take. :D Freak out when things take several milliseconds. Don't do cargo cult dogma without collecting data first (and afterwards for comparison!).

So stack is faster only in allocations ? I thought stack is faster in both allocations  and manipulation(write and read), if it's only faster in allocation, then there is little point in moving calculations from heap into stack,thanks for infromming me this.

43 minutes ago, PolarWolf said:

and manipulation(write and read)

Depending on lots of factors and context, you can have faster read access due to exploiting locality by caching (versus multiple scattered allocations on the heap).

🧙

53 minutes ago, PolarWolf said:

I thought stack is faster in both allocations  and manipulation(write and read)

RAM is RAM. All RAM is slow. RAM is incredibly slow. Unless the cache happens to have a copy of the bit of RAM that you need, then it's fast because you're talking to the cache instead of the (incredibly slow) RAM.

Anything that you've touched recently will very likely be present in the cache. Anything that you've not touched for a while probably won't be present in the cache.

Stack memory will usually be fast because objects in it are short lived, so you've usually accessed them very recently...

Heap memory will be extremely slow if you randomly access different memory addresses and constantly access different objects that haven't been used in a while.
Heap memory will be extremely fast if you predictably access different memory addresses and constantly access the same small group of objects.

If you put all of your objects on the stack, then no, it won't be fast any more. Only the parts that you've used recently will be fast, and the parts of the stack that you haven't used for a while will be slow.

Advanced side note that doesn't really matter -- on some CPUs, a small amount of stack memory might actually be mapped to CPU registers instead of RAM, in which case, it's as if it's always in cache (i.e. very fast). This is typically used to implement function arguments, etc... but it's the same principle -- if you've accessed sometime recently, it's probably fast / if you're accessing data that hasn't been touched for a while, it's probably going to be slow (until those initial accesses are complete, at which point it becomes cached / fast).

See also: https://gist.github.com/jboner/2841832

What matters is cache, this refreshes my knowledge about stack and heap.The latency in the url is very useful for optimization,thanks a lot.

2 hours ago, PolarWolf said:

What matters is cache, this refreshes my knowledge about stack and heap.

No, it refreshes your knowledge of the memory subsystem, and what matters is access pattern.  Stack and heap are software constructs that like previously mentioned only differ in allocation method and access pattern.  If you really want to know about memory read this document: http://futuretech.blinkenlights.nl/misc/cpumemory.pdf  It might go into a little to much depth but it covers all the hardware aspects of memory.  If you want to learn about the stack and heap google should help you out alot.

cpumemory.pdf  In case the link goes dead.

edit - you should also read about virtual memory and how it works.  Also it should be allocation/deallocation.

-potential energy is easily made kinetic-

On 10/10/2017 at 4:55 AM, PolarWolf said:

What matters is cache, this refreshes my knowledge about stack and heap.The latency in the url is very useful for optimization,thanks a lot.

That's a bit of an oversimplification. Practically speaking there isn't a difference between stack and heap memory, they are both just RAM. The major advantage to the stack is that it is designed to grow in a contiguous block of memory up to a predetermined size, meaning it is both fast to add and remove from it, and it tends to keep related data you might work with in one cache line.

The heap is slow because of how it is allocated(which you can control somewhat). If you say, allocate three different game objects with new, and then you create an array of pointers to each of those game objects, when you allocated each object they were probably placed arbitrarily far apart in memory. So now if you want to iterate over the array and interact with each object, you get a cache miss every time it has to jump out to the objects.

  • The heap is slow to allocate because it has to locate a suitable block of memory that is large enough for what you are asking for.
  • The heap could be slower to access than the stack, or about the same, depending on if you had it allocate a large block of memory and your code is working in that single block, vs having to jump out to separate allocations that could be all over the place. This is where cache misses become a problem.
  • Freeing memory on the heap can still be more expensive both because of locality, and the fact that the heap essentially has to be thread safe.

A final point that was mentioned a little in my last bullet is that every program thread has its own stack, and doesn't have to be locked for multi-threading concerns. The heap on the other hand, is generally shared and operations may have to lock in order to allocate or de-allocate memory. In general you can see the point that the heap will basically never be BETTER than the stack but it can be similar in performance while also allowing arbitrarily sized allocations.

cpumemory.pdf is really useful, as it's title says, every programmer should read it, it's a shame i hadn't found it after programming for so many years. And thanks Satharis ,your summary is concise and comprehensive,it's useful to me and others who read this topic.

 

This topic is closed to new replies.

Advertisement