Sign in to follow this  

C++ Is it right to increase stack size for better performance

Recommended Posts

We know stack is faster than heap,so if we increase stack size, and move some calculations into stack from heap,will the performance be better? I googled but find no clear answer to what's the max stack size limitation on windows, I set my program's stack size to 50M,all are ok,but when set to 100M,my program runs slowly,and acted weird, such as GetOpenFileNameA has no effect, no open dialog was opened and no error occured. My question is,is it right or good to increase stack size for better performance,what stack size should be set for window programs?

Edited by PolarWolf

Share this post


Link to post
Share on other sites

So stack is faster only in allocations ? I thought stack is faster in both allocations  and manipulation(write and read), if it's only faster in allocation, then there is little point in moving calculations from heap into stack,thanks for infromming me this.

Share this post


Link to post
Share on other sites
53 minutes ago, PolarWolf said:

I thought stack is faster in both allocations  and manipulation(write and read)

RAM is RAM. All RAM is slow. RAM is incredibly slow. Unless the cache happens to have a copy of the bit of RAM that you need, then it's fast because you're talking to the cache instead of the (incredibly slow) RAM.

Anything that you've touched recently will very likely be present in the cache. Anything that you've not touched for a while probably won't be present in the cache.

Stack memory will usually be fast because objects in it are short lived, so you've usually accessed them very recently...

Heap memory will be extremely slow if you randomly access different memory addresses and constantly access different objects that haven't been used in a while.
Heap memory will be extremely fast if you predictably access different memory addresses and constantly access the same small group of objects.

If you put all of your objects on the stack, then no, it won't be fast any more. Only the parts that you've used recently will be fast, and the parts of the stack that you haven't used for a while will be slow.

Advanced side note that doesn't really matter -- on some CPUs, a small amount of stack memory might actually be mapped to CPU registers instead of RAM, in which case, it's as if it's always in cache (i.e. very fast). This is typically used to implement function arguments, etc... but it's the same principle -- if you've accessed sometime recently, it's probably fast / if you're accessing data that hasn't been touched for a while, it's probably going to be slow (until those initial accesses are complete, at which point it becomes cached / fast).

See also: https://gist.github.com/jboner/2841832

Share this post


Link to post
Share on other sites
2 hours ago, PolarWolf said:

What matters is cache, this refreshes my knowledge about stack and heap.

No, it refreshes your knowledge of the memory subsystem, and what matters is access pattern.  Stack and heap are software constructs that like previously mentioned only differ in allocation method and access pattern.  If you really want to know about memory read this document: http://futuretech.blinkenlights.nl/misc/cpumemory.pdf  It might go into a little to much depth but it covers all the hardware aspects of memory.  If you want to learn about the stack and heap google should help you out alot.

cpumemory.pdf  In case the link goes dead.

edit - you should also read about virtual memory and how it works.  Also it should be allocation/deallocation.

Edited by Infinisearch

Share this post


Link to post
Share on other sites
On 10/10/2017 at 4:55 AM, PolarWolf said:

What matters is cache, this refreshes my knowledge about stack and heap.The latency in the url is very useful for optimization,thanks a lot.

That's a bit of an oversimplification. Practically speaking there isn't a difference between stack and heap memory, they are both just RAM. The major advantage to the stack is that it is designed to grow in a contiguous block of memory up to a predetermined size, meaning it is both fast to add and remove from it, and it tends to keep related data you might work with in one cache line.

The heap is slow because of how it is allocated(which you can control somewhat). If you say, allocate three different game objects with new, and then you create an array of pointers to each of those game objects, when you allocated each object they were probably placed arbitrarily far apart in memory. So now if you want to iterate over the array and interact with each object, you get a cache miss every time it has to jump out to the objects.

  • The heap is slow to allocate because it has to locate a suitable block of memory that is large enough for what you are asking for.
  • The heap could be slower to access than the stack, or about the same, depending on if you had it allocate a large block of memory and your code is working in that single block, vs having to jump out to separate allocations that could be all over the place. This is where cache misses become a problem.
  • Freeing memory on the heap can still be more expensive both because of locality, and the fact that the heap essentially has to be thread safe.

A final point that was mentioned a little in my last bullet is that every program thread has its own stack, and doesn't have to be locked for multi-threading concerns. The heap on the other hand, is generally shared and operations may have to lock in order to allocate or de-allocate memory. In general you can see the point that the heap will basically never be BETTER than the stack but it can be similar in performance while also allowing arbitrarily sized allocations.

Edited by Satharis

Share this post


Link to post
Share on other sites

cpumemory.pdf is really useful, as it's title says, every programmer should read it, it's a shame i hadn't found it after programming for so many years. And thanks Satharis ,your summary is concise and comprehensive,it's useful to me and others who read this topic.

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      628707
    • Total Posts
      2984310
  • Similar Content

    • By NexusDivision
      Hello people of gamedev.net

      Me and my team have been working on a MMORPG game with Unreal Engine 4 for quite some time now.
      We are seeking beta tester's and have beta key's available to people who sign up on our website.
      Please visit the website https://nexusdivision.com
      Feel free to register on our forums, We can talk about the game and help everyone get a better idea of what type of game it is. 

      Legion is a 3D fantasy MMORPG that has features including massive scale battles, unique characters and monsters, customization of avatars, special equipment and more. Players choose between the starter stats of Warrior, Magician, Archer and character advancement occurs through a mix of questing, PvP, Guild Wars, and hunting, depending upon player preference. In Legion, completely open PvP battles take place between members of the two warring factions.

      We plan to make this game very competitive and exciting 
    • By Benjamin Shefte
      Hey there,  I have this old code im trying to compile using GCC and am running into a few issues..
      im trying to figure out how to convert these functions to gcc
      static __int64 MyQueryPerformanceFrequency() { static __int64 aFreq = 0; if(aFreq!=0) return aFreq; LARGE_INTEGER s1, e1, f1; __int64 s2, e2, f2; QueryPerformanceCounter(&s1); s2 = MyQueryPerformanceCounter(); Sleep(50); e2 = MyQueryPerformanceCounter(); QueryPerformanceCounter(&e1); QueryPerformanceFrequency(&f1); double aTime = (double)(e1.QuadPart - s1.QuadPart)/f1.QuadPart; f2 = (e2 - s2)/aTime; aFreq = f2; return aFreq; } void PerfTimer::GlobalStart(const char *theName) { gPerfTimerStarted = true; gPerfTotalTime = 0; gPerfTimerStartCount = 0; gPerfElapsedTime = 0; LARGE_INTEGER anInt; QueryPerformanceCounter(&anInt); gPerfResetTick = anInt.QuadPart; } /////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////// void PerfTimer::GlobalStop(const char *theName) { LARGE_INTEGER anInt; QueryPerformanceCounter(&anInt); LARGE_INTEGER aFreq; QueryPerformanceFrequency(&aFreq); gPerfElapsedTime = (double)(anInt.QuadPart - gPerfResetTick)/aFreq.QuadPart*1000.0; gPerfTimerStarted = false; }  
      I also tried converting this function (original function is the first function below and my converted for gcc function is under that) is this correct?:
      #if defined(WIN32) static __int64 MyQueryPerformanceCounter() { // LARGE_INTEGER anInt; // QueryPerformanceCounter(&anInt); // return anInt.QuadPart; #if defined(WIN32) unsigned long x,y; _asm { rdtsc mov x, eax mov y, edx } __int64 result = y; result<<=32; result|=x; return result; } #else static __int64 MyQueryPerformanceCounter() { struct timeval t1, t2; double elapsedTime; // start timer gettimeofday(&t1, NULL); Sleep(50); // stop timer gettimeofday(&t2, NULL); // compute and print the elapsed time in millisec elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0; // sec to ms elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0; // us to ms return elapsedTime; } #endif Any help would be appreciated, Thank you!
    • By Matuda
      Hello!
      Trying to create a physics puzzle game in my "free" time.
      So far it's going very steady, but slow.
      Hope to get some feedback from you!



      Area 86 is a physics-based game, that lets you control a robot at a secret place in space.
      From simple item moving to custom imagined solutions with item picking, throwing, combining and activating!
      Explore & examine all possibilities each place has to offer and do your best to get further.
      But remember, each action has consequences and thus could break or make something unexpected.


      Quick overlook of main features:
      Physics-based gameplay with no bugs or whatsoever Tasks that give you more clue on how to do things wrong Controllable robot who can be blamed for all consequences Includes more than 1 level and each level contains less than 12 possible tasks to complete [ not in free version ] Secret places and hidden objects for extra challenge  
      What can you find in the free downloadable version:
      One fully completable level with 6 tasks and 2 hidden special items to discover.
      From the task list, 2 are main tasks which you should complete to get further and then there are 4 other tasks that should challenge your thinking.
      One of the secret items is visible instant, but you need to figure out how to collect it, while the other special item is hiding.
      Another extra feature is visual hints, that should force your thinking of discovering features.

      Download playable version for your system:

          



    • By mister345
      Hi, I'm building a game engine using DirectX11 in c++.
      I need a basic physics engine to handle collisions and motion, and no time to write my own.
      What is the easiest solution for this? Bullet and PhysX both seem too complicated and would still require writing my own wrapper classes, it seems. 
      I found this thing called PAL - physics abstraction layer that can support bullet, physx, etc, but it's so old and no info on how to download or install it.
      The simpler the better. Please let me know, thanks!
  • Popular Now