• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By dgi
      Hey all ,
      For a few days I'm trying to solve some problems with my engine's memory management.Basically what is have is a custom heap with pre allocated memory.Every block has a header and so on.I decided to leave it like that(not cache friendly) because my model is that every block will be large and I will have a pool allocators and stack allocators dealing with those blocks internally. So far so good I figure out how to place my per scene resources . There is one thing that I really don't know how to do and thats dealing with containers.What kind of allocation strategy to use here.
      If I use vector for my scene objects(entities , cameras , particle emitters .. ) I will fragment my custom heap if I do it in a standard way , adding and removing objects will cause a lot of reallocations . If I use a linked list this will not fragment the memory but it's not cache friendly.I guess if a reserve large amount of memory for those vectors it will work but then I will waste a lot memory.I was thinking for some sort of mix between a vector and a linked list , where you have block of memory that can contain lets say 40 items and if you go over that number a new one will be created and re location of the data would not be needed.There would be some cache misses but it will reduce the fragmentation.
      How you guys deal with that ? Do you just reserve a lot data ?
    • By Hermetix
      I am trying to setup the custom wizard for making a 3ds MAX 2018 plug-in (to export a character animation data), but I can't locate the wizard file folder to put the .vsz file in. In the 3ds MAX 2018 docs, it only mentions where the folder is in VS 2015 (VC/vcprojects). It's a VC++ project, but I don't see any folder in VC for the wizard files. I'm using VS 2017 update 15.5.6 Enterprise, and the folders in VC are: Auxiliary, Redist and Tools.
    • By elect
      ok, so, we are having problems with our current mirror reflection implementation.
      At the moment we are doing it very simple, so for the i-th frame, we calculate the reflection vectors given the viewPoint and some predefined points on the mirror surface (position and normal).
      Then, using the least squared algorithm, we find the point that has the minimum distance from all these reflections vectors. This is going to be our virtual viewPoint (with the right orientation).
      After that, we render offscreen to a texture by setting the OpenGL camera on the virtual viewPoint.
      And finally we use the rendered texture on the mirror surface.
      So far this has always been fine, but now we are having some more strong constraints on accuracy.
      What are our best options given that:
      - we have a dynamic scene, the mirror and parts of the scene can change continuously from frame to frame
      - we have about 3k points (with normals) per mirror, calculated offline using some cad program (such as Catia)
      - all the mirror are always perfectly spherical (with different radius vertically and horizontally) and they are always convex
      - a scene can have up to 10 mirror
      - it should be fast enough also for vr (Htc Vive) on fastest gpus (only desktops)

      Looking around, some papers talk about calculating some caustic surface derivation offline, but I don't know if this suits my case
      Also, another paper, used some acceleration structures to detect the intersection between the reflection vectors and the scene, and then adjust the corresponding texture coordinate. This looks the most accurate but also very heavy from a computational point of view.

      Other than that, I couldn't find anything updated/exhaustive around, can you help me?
      Thanks in advance
    • By KarimIO
      Hey guys! Three questions about uniform buffers:
      1) Is there a benefit to Vulkan and DirectX's Shader State for the Constant/Uniform Buffer? In these APIs, and NOT in OpenGL, you must set which shader is going to take each buffer. Why is this? For allowing more slots?
      2) I'm building an wrapper over these graphics APIs, and was wondering how to handle passing parameters. In addition, I used my own json format to describe material formats and shader formats. In this, I can describe which shaders get what uniform buffers. I was thinking of moving to support ShaderLab (Unity's shader format) instead, as this would allow people to jump over easily enough and ease up the learning curve. But ShaderLab does not support multiple Uniform Buffers at all, as I can tell, let alone what parameters go where. 
      So to fix this, I was just going to send all Uniform Buffers to all shaders. Is this that big of a problem?
      3) Do you have any references on how to organize material uniform buffers? I may be optimizing too early, but I've seen people say what a toll this can take.
    • By ANIO chan
      Hi, I'm new here and would like to get some help in what i should do first when designing a game? What would you consider to be the best steps to begin designing my game? Give resources with it as well please.
  • Advertisement
  • Advertisement
Sign in to follow this  

C++ Is it right to increase stack size for better performance

Recommended Posts

We know stack is faster than heap,so if we increase stack size, and move some calculations into stack from heap,will the performance be better? I googled but find no clear answer to what's the max stack size limitation on windows, I set my program's stack size to 50M,all are ok,but when set to 100M,my program runs slowly,and acted weird, such as GetOpenFileNameA has no effect, no open dialog was opened and no error occured. My question is,is it right or good to increase stack size for better performance,what stack size should be set for window programs?

Edited by PolarWolf

Share this post

Link to post
Share on other sites

So stack is faster only in allocations ? I thought stack is faster in both allocations  and manipulation(write and read), if it's only faster in allocation, then there is little point in moving calculations from heap into stack,thanks for infromming me this.

Share this post

Link to post
Share on other sites
43 minutes ago, PolarWolf said:

and manipulation(write and read)

Depending on lots of factors and context, you can have faster read access due to exploiting locality by caching (versus multiple scattered allocations on the heap).

Share this post

Link to post
Share on other sites
53 minutes ago, PolarWolf said:

I thought stack is faster in both allocations  and manipulation(write and read)

RAM is RAM. All RAM is slow. RAM is incredibly slow. Unless the cache happens to have a copy of the bit of RAM that you need, then it's fast because you're talking to the cache instead of the (incredibly slow) RAM.

Anything that you've touched recently will very likely be present in the cache. Anything that you've not touched for a while probably won't be present in the cache.

Stack memory will usually be fast because objects in it are short lived, so you've usually accessed them very recently...

Heap memory will be extremely slow if you randomly access different memory addresses and constantly access different objects that haven't been used in a while.
Heap memory will be extremely fast if you predictably access different memory addresses and constantly access the same small group of objects.

If you put all of your objects on the stack, then no, it won't be fast any more. Only the parts that you've used recently will be fast, and the parts of the stack that you haven't used for a while will be slow.

Advanced side note that doesn't really matter -- on some CPUs, a small amount of stack memory might actually be mapped to CPU registers instead of RAM, in which case, it's as if it's always in cache (i.e. very fast). This is typically used to implement function arguments, etc... but it's the same principle -- if you've accessed sometime recently, it's probably fast / if you're accessing data that hasn't been touched for a while, it's probably going to be slow (until those initial accesses are complete, at which point it becomes cached / fast).

See also: https://gist.github.com/jboner/2841832

Share this post

Link to post
Share on other sites

What matters is cache, this refreshes my knowledge about stack and heap.The latency in the url is very useful for optimization,thanks a lot.

Share this post

Link to post
Share on other sites
2 hours ago, PolarWolf said:

What matters is cache, this refreshes my knowledge about stack and heap.

No, it refreshes your knowledge of the memory subsystem, and what matters is access pattern.  Stack and heap are software constructs that like previously mentioned only differ in allocation method and access pattern.  If you really want to know about memory read this document: http://futuretech.blinkenlights.nl/misc/cpumemory.pdf  It might go into a little to much depth but it covers all the hardware aspects of memory.  If you want to learn about the stack and heap google should help you out alot.

cpumemory.pdf  In case the link goes dead.

edit - you should also read about virtual memory and how it works.  Also it should be allocation/deallocation.

Edited by Infinisearch

Share this post

Link to post
Share on other sites
On 10/10/2017 at 4:55 AM, PolarWolf said:

What matters is cache, this refreshes my knowledge about stack and heap.The latency in the url is very useful for optimization,thanks a lot.

That's a bit of an oversimplification. Practically speaking there isn't a difference between stack and heap memory, they are both just RAM. The major advantage to the stack is that it is designed to grow in a contiguous block of memory up to a predetermined size, meaning it is both fast to add and remove from it, and it tends to keep related data you might work with in one cache line.

The heap is slow because of how it is allocated(which you can control somewhat). If you say, allocate three different game objects with new, and then you create an array of pointers to each of those game objects, when you allocated each object they were probably placed arbitrarily far apart in memory. So now if you want to iterate over the array and interact with each object, you get a cache miss every time it has to jump out to the objects.

  • The heap is slow to allocate because it has to locate a suitable block of memory that is large enough for what you are asking for.
  • The heap could be slower to access than the stack, or about the same, depending on if you had it allocate a large block of memory and your code is working in that single block, vs having to jump out to separate allocations that could be all over the place. This is where cache misses become a problem.
  • Freeing memory on the heap can still be more expensive both because of locality, and the fact that the heap essentially has to be thread safe.

A final point that was mentioned a little in my last bullet is that every program thread has its own stack, and doesn't have to be locked for multi-threading concerns. The heap on the other hand, is generally shared and operations may have to lock in order to allocate or de-allocate memory. In general you can see the point that the heap will basically never be BETTER than the stack but it can be similar in performance while also allowing arbitrarily sized allocations.

Edited by Satharis

Share this post

Link to post
Share on other sites

cpumemory.pdf is really useful, as it's title says, every programmer should read it, it's a shame i hadn't found it after programming for so many years. And thanks Satharis ,your summary is concise and comprehensive,it's useful to me and others who read this topic.


Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Advertisement