Sign in to follow this  
PolarWolf

C++ Is it right to increase stack size for better performance

Recommended Posts

We know stack is faster than heap,so if we increase stack size, and move some calculations into stack from heap,will the performance be better? I googled but find no clear answer to what's the max stack size limitation on windows, I set my program's stack size to 50M,all are ok,but when set to 100M,my program runs slowly,and acted weird, such as GetOpenFileNameA has no effect, no open dialog was opened and no error occured. My question is,is it right or good to increase stack size for better performance,what stack size should be set for window programs?

Edited by PolarWolf

Share this post


Link to post
Share on other sites

So stack is faster only in allocations ? I thought stack is faster in both allocations  and manipulation(write and read), if it's only faster in allocation, then there is little point in moving calculations from heap into stack,thanks for infromming me this.

Share this post


Link to post
Share on other sites
53 minutes ago, PolarWolf said:

I thought stack is faster in both allocations  and manipulation(write and read)

RAM is RAM. All RAM is slow. RAM is incredibly slow. Unless the cache happens to have a copy of the bit of RAM that you need, then it's fast because you're talking to the cache instead of the (incredibly slow) RAM.

Anything that you've touched recently will very likely be present in the cache. Anything that you've not touched for a while probably won't be present in the cache.

Stack memory will usually be fast because objects in it are short lived, so you've usually accessed them very recently...

Heap memory will be extremely slow if you randomly access different memory addresses and constantly access different objects that haven't been used in a while.
Heap memory will be extremely fast if you predictably access different memory addresses and constantly access the same small group of objects.

If you put all of your objects on the stack, then no, it won't be fast any more. Only the parts that you've used recently will be fast, and the parts of the stack that you haven't used for a while will be slow.

Advanced side note that doesn't really matter -- on some CPUs, a small amount of stack memory might actually be mapped to CPU registers instead of RAM, in which case, it's as if it's always in cache (i.e. very fast). This is typically used to implement function arguments, etc... but it's the same principle -- if you've accessed sometime recently, it's probably fast / if you're accessing data that hasn't been touched for a while, it's probably going to be slow (until those initial accesses are complete, at which point it becomes cached / fast).

See also: https://gist.github.com/jboner/2841832

Share this post


Link to post
Share on other sites
2 hours ago, PolarWolf said:

What matters is cache, this refreshes my knowledge about stack and heap.

No, it refreshes your knowledge of the memory subsystem, and what matters is access pattern.  Stack and heap are software constructs that like previously mentioned only differ in allocation method and access pattern.  If you really want to know about memory read this document: http://futuretech.blinkenlights.nl/misc/cpumemory.pdf  It might go into a little to much depth but it covers all the hardware aspects of memory.  If you want to learn about the stack and heap google should help you out alot.

cpumemory.pdf  In case the link goes dead.

edit - you should also read about virtual memory and how it works.  Also it should be allocation/deallocation.

Edited by Infinisearch

Share this post


Link to post
Share on other sites
On 10/10/2017 at 4:55 AM, PolarWolf said:

What matters is cache, this refreshes my knowledge about stack and heap.The latency in the url is very useful for optimization,thanks a lot.

That's a bit of an oversimplification. Practically speaking there isn't a difference between stack and heap memory, they are both just RAM. The major advantage to the stack is that it is designed to grow in a contiguous block of memory up to a predetermined size, meaning it is both fast to add and remove from it, and it tends to keep related data you might work with in one cache line.

The heap is slow because of how it is allocated(which you can control somewhat). If you say, allocate three different game objects with new, and then you create an array of pointers to each of those game objects, when you allocated each object they were probably placed arbitrarily far apart in memory. So now if you want to iterate over the array and interact with each object, you get a cache miss every time it has to jump out to the objects.

  • The heap is slow to allocate because it has to locate a suitable block of memory that is large enough for what you are asking for.
  • The heap could be slower to access than the stack, or about the same, depending on if you had it allocate a large block of memory and your code is working in that single block, vs having to jump out to separate allocations that could be all over the place. This is where cache misses become a problem.
  • Freeing memory on the heap can still be more expensive both because of locality, and the fact that the heap essentially has to be thread safe.

A final point that was mentioned a little in my last bullet is that every program thread has its own stack, and doesn't have to be locked for multi-threading concerns. The heap on the other hand, is generally shared and operations may have to lock in order to allocate or de-allocate memory. In general you can see the point that the heap will basically never be BETTER than the stack but it can be similar in performance while also allowing arbitrarily sized allocations.

Edited by Satharis

Share this post


Link to post
Share on other sites

cpumemory.pdf is really useful, as it's title says, every programmer should read it, it's a shame i hadn't found it after programming for so many years. And thanks Satharis ,your summary is concise and comprehensive,it's useful to me and others who read this topic.

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      627731
    • Total Posts
      2978831
  • Similar Content

    • By ScyllaBus
      Using my loop based on this: https://gafferongames.com/post/fix_your_timestep/
      Trying to get my game to run at fixed 60FPS (both update and render) for all machines. Studied the link above and have been stuck on this game loop for weeks trying to get it to work smoothly to glide this image across the screen. I had dealt constantly with jittering and possible tearing. I can't recall what I did to fix it exactly, but I believe it may have something to do with not rounding a variable properly (such as delta).
       
      So yeah, currently the loop works but I'm afraid as I develop the game more and have to render more, eventually something I'm doing in my loop could cause slowdowns or larger CPU usage. Does the structure of the game loop below seem okay or is there something I can do to optimize it?
      The 2D game is a generic sidescroller. Not too heavy on physics, mainly just simple platformer physics.
       
      void Game::mainLoop() { double fps = 60.0f; int frameSkip = 5; int deltaSkip = frameSkip; double miliPerFrame = 1000.0 / fps; double xx = 0.0f; double playSpeed = 5; Uint64 previous = SDL_GetPerformanceCounter(); double accumulator = 0.0f; bool shouldRender = false; bool running = true; while(running){ Uint64 current = SDL_GetPerformanceCounter(); double elapsed = (current-previous) * 1000; elapsed = (double) (elapsed / SDL_GetPerformanceFrequency() ); previous = current; // handleEvents() handleEvents(); // when we press escape reset x to 0 to keep testing // when he goes off screen if(key_states[SDL_SCANCODE_ESCAPE]) xx = 0; accumulator+=elapsed; if(accumulator >= miliPerFrame * frameSkip) accumulator = 0; shouldRender = accumulator >= miliPerFrame; while(accumulator >= miliPerFrame){ // update() //cout << playSpeed << endl; double delta = ceil(elapsed); if(delta > deltaSkip) delta = 1; //if(elapsed >= 1) delta = elapsed; xx+= playSpeed * delta;// * (1 / fps); // /update() accumulator -= miliPerFrame; //get what's left over } if(shouldRender){ // render() SDL_SetRenderDrawColor(gameRenderer, 0xFF, 0xFF, 0xFF, 0xFF); SDL_RenderClear(gameRenderer); imageController.drawImage("colorkeytest", floor(xx), 0); SDL_RenderPresent(gameRenderer); // /render() } } }  
    • By ilovegames
      Your home planet was attacked. Now you have to use your spaceship to battle the invaders. Powerful 3D arcade with outer space background. Very addictive. Good luck!
      Download https://falcoware.com/StarFighter.php
       




    • By ilovegames
      Attack Of Mutants is an adrenaline - powered and bloody shooter in with lots of horror and action! Beat back the waves of opponents. The game features a lot of weapons and types of enemies. Show them what you are capable of. Prove your power and strength!
      Download https://falcoware.com/AttackOfMutants.php



    • By ilovegames
      BOOM is a multiplayer shooter that takes place on one of the satellites of Saturn. Destroy your enemies using your large arsenal! The game features excellent graphics and a spacious map. Good luck fighter!   Controls: W - Forward S - Backward A - Left D - Right SPACE - Jump Enter - Chat LBM - Shot RBM - Sight V - Third party   Download https://falcoware.com/rus/BOOM.php



    • By SR D
      I've been learning how to do vertex buffers plus index buffers using Ogre, but I believe this is mostly the same across several engines. I have this question about using vertex buffers + index buffers.
      Using DynamicGeometryGameState (from Ogre) as an example, I noticed that when drawing the cubes, they were programmatically drawn in order within the createIndexBuffer() function like so ...
       
      const Ogre::uint16 c_indexData[3 * 2 * 6] = { 0, 1, 2, 2, 3, 0, //Front face 6, 5, 4, 4, 7, 6, //Back face 3, 2, 6, 6, 7, 3, //Top face 5, 1, 0, 0, 4, 5, //Bottom face 4, 0, 3, 3, 7, 4, //Left face 6, 2, 1, 1, 5, 6, //Right face };
      From the above, the front face is drawn using the vertices 0, 1, 2, 2, 3, 0. But when reading in thousands of vertices from a file, one obviously doesn't code an array specifying which vertices make up a face.
      So how is this done when working with a large number of vertices?
  • Popular Now