So i take it you just have a special 'stack allocator' that really uses the heap but works like a stack under the hood?
Pretty much. I try to avoid all globals in my current engine, which includes "the heap", as it obviously is a global data structure. I create quite a few different stacks (and pools, and heaps) for different purposes. The Dice scope stack allocation presentation gives a good idea on how to implement this while retaining C++ niceties such as RAII and constructors/the object model.
Back in the PS2/Wii era, we actually pre-allocated all the RAM and then split it up by geographical regions in the game world! e.g. we'd have 3 "level chunk" sized buffers, each of which had a "cursor" for stack allocations. When streaming in a new chunk of a game level, we'd pre-create all the objects possibly required within that chunk's stack. This let us have 2 chunks loaded at a time, and a 3rd one streaming in / being constructed. If you needed temporary / per-frame memory, you could record the cursor, allocate some more objects on the stack, and then reset the cursor back to your 'recorded' value to 'erase' them. When unloading a chunk of the world, we'd just reset it's cursor to the beginning of it's buffer, to indicate all those objects were gone. This was back in the "C with classes" style of C++, so we didn't simply call destructors, etc... and had custom shutdown logic where required.
In my case, it was an OS callback that was fired on (among others) malloc() and free(). The options were VLAs or always worst case
Change Swift's quote to start with "In most cases, " then