• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
fir

runtime cost of stack overflow checking

8 posts in this topic

I often wonder but i do not know this thing: when you do a push 

data on stack it could be stack overflow - on the old machines it was

not checked so you winding the stack then overwriting of old stack

spaces with new walues - stack corruption

On windows system it seem to me that this is quarded it is you got 

an exception, memory acces fault, something like that

 

Does it cost some runtime or this is free? I suspect that maybe it 

is maybe like that that stack acces is not runtime checked but 

after the stack pages you got an empty memory page where writting

wil just cause an exception? is this the case? if you got a large

amounts of user stacks in the system are they are guarded at the end 

by empty pages?

 

If this runtime cost free mechanism is working like that i find accidentaly

that this probably could be used for guarding normal arrays boundary 

acces checking - signaled exception is better than quiet data corruption..

 

Is it working like that?

 

 

 

 

0

Share this post


Link to post
Share on other sites
As far as I know, for non-segmented stacks, a stack overflow manifests as an access violation into the next (or previous) memory sector.

For segmented stacks, there's a run-time check that happens when a function is called to see if there's enough room for all local variables and similar on the stack. It's a simple arithmetic calculation that does not access memory.
1

Share this post


Link to post
Share on other sites

As far as I know, for non-segmented stacks, a stack overflow manifests as an access violation into the next (or previous) memory sector.

For segmented stacks, there's a run-time check that happens when a function is called to see if there's enough room for all local variables and similar on the stack. It's a simple arithmetic calculation that does not access memory.

 

could you tell maybe a bit more what do you mean by segmented stacks? does windows use that? (i checked something about that 

in google but i do not know it is used or this is just some artificial

contstruct made in user programs)

 

as to the main question i suspect i answered to myslelf (i suspect that 

it is guarded at the end (or att the bottom as stack grows opposite)

by non writable pages - though i am not sure

 

also yet one unclear cuestion - people say sometimes that every thread has its own stack and they say that by default the stack is usually 1MB size - Checking system info gives me the information of 

600 threads running - do this really means 600MB of reserved ram

all for thread stacks ? :-/

Edited by fir
0

Share this post


Link to post
Share on other sites
It's a compiler option. Gcc calls it split stack. Segmented stack is the clang term. Not sure if VC has a similar feature. MinGW should, however; it's not OS integrated.

A large number of threads is exactly the motivation for split stacks, because it allows smaller stacks for most threads. So an application with 600 threads is more likely to be using split stacks.

But without split stacks, yeah, each thread needs to allocate a lot of memory for the stack. A saving grace is that allocating memory and not using it doesn't have a real performance impact; it only uses up your application's virtual memory pool. Also, the side of a non-split stack is configurable.

Edit: It seems to be poorly documented, but present in clang/llvm. Edited by King Mir
1

Share this post


Link to post
Share on other sites

 

 

 

The reason nobody uses this to automatically do bounds checking on standard data arrays is because the minimum page size is 4K.  This means your array would need to be a multiple of 4K in size, and you would need to isolate the array in address space such that no other allocated pages are nearby.  It would waste a lot of address space to do this.

 

I think it wouldnt be a lot (I am still not sure if the stacks are guarded like that but maybe, and i think it would be okay - bounding checks checks for free) 4k is not so much, you usually got no millions arrays

in the system (especially i thing guarded could be only the bigger ones) i think this would be 'number of guarded arrays' x 4kb (x2 if both ways) ;/ (+ maybe the unused space of last array page till the 4k boundary) especially on x64 you got a lot of adress space

 

but i heard something that now pages are bigger tham 4k ?

 

as to wasting logical space : are the previous c static arrays 

putted each in the new pages or just tighten with the old ones in their old pages?

0

Share this post


Link to post
Share on other sites

Dealing with stack is the responsibility of whoever uses it by the rules of whoever manages the stack to the extent of the OS getting out of the way. In regards to Windows, that means: you can override every aspect of the stack behavior completely - just not as efficiently as the OS could.

 

Slight corrections to what has been said already:

By default on windows a typical stack uses a guard page to trigger stack expansion with the additional requirement that if a function uses more that one page worth of stack then it must call a special function to ensure the guard page is not skipped (this special function call is added automatically by the compiler as part of calling convention / exception handling - in function prolog: "__alloca_probe").

 

So, besides the overhead of the function call when more than a page is needed - there is no overhead as no checks are done.

 

Stack underflow is not guarded (just not worth it). IF you are lucky then the memory there is inaccessible and hence causes an immediate access violation crash.

 

----------------------

Something i use to check the state of address space (VC++):

void dump() {
    // dump memory map
    intp at = 0;
    MEMORY_BASIC_INFORMATION info;
    while(true) {
        if(VirtualQuery((void*)at, &info, sizeof(info)) != sizeof(info)) break;
        LOG(L"%c %p - %p %c%c%c%c%c%c ( %7d KB  %s ) %s"
            , info.AllocationBase == info.BaseAddress || info.AllocationBase == 0 ? L'>' : L' '
            , intp(info.BaseAddress)
            , intp(info.BaseAddress) + intp(info.RegionSize) - 1
            , info.Protect == 0 ? L'?' : (info.Protect & 0x10 ? L'!' : L' ')
            , info.Protect & 0xF0 ? L'x' : '-'
            , info.Protect & 0x66 ? L'r' : '-'
            , info.Protect & 0xCC ? L'w' : '-'
            , info.Protect & 0x88 ? L'c' : '-'
            , info.Protect & 0x100 ? L'G' : '-'
            , int32u(info.RegionSize >> 10)
            , intp(info.BaseAddress) & 65535 ? L"a:???" : L"a:64K"
            , info.State == MEM_FREE ? L"¤" : (info.State == MEM_COMMIT ? L"USED" : L"RESERVED")
        );
        at += info.RegionSize;
    }
}
 
// some quickly made dirty Logger of dubious quality
void logging(wchar_t const *format, ...) {
    va_list argptr; va_start(argptr, format);
    wchar_t buf[1024];
    int wrote = vswprintf(&buf[0], 1024, format, argptr);
    if(wrote<0) wrote = 0;
    va_end(argptr);
    if(wrote) OutputDebugStringW(buf);
}
#define LOG(...) logging(L"\n" __VA_ARGS__);
// intp = unsigned integer fit for pointer (ie. uintptr_t), int32u = unsigned 32 bit integer

which, in my case, gives (some excerpts):

> 00000000 - 0000FFFF  ----- (      64 KB  a:64K ) ¤ // a page granularity worth of a general no-go area
  ...
> 00020000 - 0002EFFF  -rw-- (      60 KB  a:64K ) USED // a guarded cyclic buffer of mine - just something the program used for the dump() apparently uses
  0002F000 - 0002FFFF  -r--G (       4 KB  a:??? ) USED
  ...
> 00090000 - 00090FFF  -r--- (       4 KB  a:64K ) USED // my program (non fixed image address)
  00091000 - 000A0FFF  xrw-- (      64 KB  a:??? ) USED // ... "x" AND "w" !? ... wut!? What is this!?
  000A1000 - 000B5FFF  xr--- (      84 KB  a:??? ) USED // ... program code
  000B6000 - 000B9FFF  -r--- (      16 KB  a:??? ) USED // ... data
  000BA000 - 00168FFF  -rw-- (     700 KB  a:??? ) USED
  00169000 - 0016BFFF  -r--- (      12 KB  a:??? ) USED
  ...
> 00250000 - 00288FFF ?----- (     228 KB  a:64K ) RESERVED // some stack (should not be mine, directly - dunno)
  00289000 - 0028BFFF  -rw-G (      12 KB  a:??? ) USED
  0028C000 - 0028FFFF  -rw-- (      16 KB  a:??? ) USED
> 00290000 - 0031FFFF  ----- (     576 KB  a:64K ) ¤ // memory following stack is not reserved and is free for grab
  ...
> 00440000 - 0053BFFF ?----- (    1008 KB  a:64K ) RESERVED // my stack (1MB stack is the the default setting in VC++)
  0053C000 - 0053CFFF  -rw-G (       4 KB  a:??? ) USED
  0053D000 - 0053FFFF  -rw-- (      12 KB  a:??? ) USED
> 00540000 - 005CFFFF  ----- (     576 KB  a:64K ) ¤ // again - free for grab
  ...
> 005D0000 - 005D7FFF  -rw-- (      32 KB  a:64K ) USED // heap perhaps?
  005D8000 - 006CFFFF ?----- (     992 KB  a:??? ) RESERVED
  ...
> 007D0000 - 207CFFFF  -rw-- (  524288 KB  a:64K ) USED // half a GB used for my own - non heap
  ...
> 207D0000 - 607CFFFF  ----- ( 1048576 KB  a:64K ) ¤ // first big chunk of unused address space
> 607D0000 - 607D0FFF  -r--- (       4 KB  a:64K ) USED // dll/system land begins
  607D1000 - 60958FFF  xr--- (    1568 KB  a:??? ) USED
  60959000 - 6095EFFF  -rw-- (      24 KB  a:??? ) USED
  6095F000 - 60970FFF  -r--- (      72 KB  a:??? ) USED
> 60971000 - 7533FFFF  ----- (  337724 KB  a:??? ) ¤ // a stray block of free address space cut short by some dll/system shenanigans
  ...
  dll/system heaven
  ...
> 7FFF0000 - FFFAFFFF  ----- ( 2096896 KB  a:64K ) ¤ // some more address space - the benefits of "large address aware" on 64bit windows.

-----------------------------------

"but i heard something that now pages are bigger than 4k ?"

 

Bigger pages have always been possible (i386+ for the whole x86 line) - just that it is rarely ever used on desktop machines.

 

Note also: on some hardware (non x86) - 4KB pages are not even an option and 8KB is the smallest possible for example.

Edited by tanzanite7
1

Share this post


Link to post
Share on other sites

Gcc calls it split stack. Segmented stack is the clang term. Not sure if VC has a similar feature. MinGW should, however; it's not OS integrated.

This is a GNU/Linux only feature at the present time. MinGW doesn't support it.

 

 

 

Long explanation of guard pages

This, exactly.

 

 

 

600 threads running - do this really means 600MB of reserved ram

600 threads running means you're doing something seriously wrong. An application will very rarely need more than a dozen threads. Usually you will want approximately one work-intensive thread per CPU core (or fewer, if some cores are "hyperthreaded" cores), and one or two mostly-idle threads for having "other stuff" such as lengthy input/output run asynchronously without blocking the main thread.

 

But yes, given default stack sizes, 600 threads would mean 600MiB of reserved address space (and 2400kiB [?2.3MiB] of committed RAM). Note that having 600 threads running on the system as a whole (not in one single application) isn't an issue at all since they're in different address spaces.

 

Large pages are something that works "almost-automatically" under recent versions of Linux if configured right, and with some limitations. Large pages are quite unwieldy under Windows, and come with serious limitations. In either case, it is not something that 99% of all programmers have to worry about. The default works fine.

The only advantage of large pages is that you save a few TLB entries, and unless you're writing something like a relational database of the kind that e.g. Oracle is selling, there is absolutely no measurable difference for you.

Edited by samoth
1

Share this post


Link to post
Share on other sites

Windows can automatically create guard pages after every allocation for any application that uses HeapAlloc(), if you ask it to. See http://msdn.microsoft.com/en-us/library/windows/hardware/ff549561%28v=vs.85%29.aspx

 

It's generally only used for debugging, but you could easily implement it yourself by using VirtualAlloc() and marking the page before and after the array as PAGE_NO_ACCESS using VirtualProtect(). It will obviously only reliably detect overruns in one direction - either off the start or off the end of the array, unless the array size is an exact multiple of 4k.

 

In addition to the extra address space it eats up (which isn't an issue on 64-bit), it adds extra overhead to allocation and deletion. You may also see a performance hit from extra cache and TLB misses.

1

Share this post


Link to post
Share on other sites

 


Gcc calls it split stack. Segmented stack is the clang term. Not sure if VC has a similar feature. MinGW should, however; it's not OS integrated.

This is a GNU/Linux only feature at the present time. MinGW doesn't support it.

 

 

 

Long explanation of guard pages

This, exactly.

 

 

 

600 threads running - do this really means 600MB of reserved ram

600 threads running means you're doing something seriously wrong. An application will very rarely need more than a dozen threads. Usually you will want approximately one work-intensive thread per CPU core (or fewer, if some cores are "hyperthreaded" cores), and one or two mostly-idle threads for having "other stuff" such as lengthy input/output run asynchronously without blocking the main thread.

 

But yes, given default stack sizes, 600 threads would mean 600MiB of reserved address space (and 2400kiB [?2.3MiB] of committed RAM). Note that having 600 threads running on the system as a whole (not in one single application) isn't an issue at all since they're in different address spaces.

 

Large pages are something that works "almost-automatically" under recent versions of Linux if configured right, and with some limitations. Large pages are quite unwieldy under Windows, and come with serious limitations. In either case, it is not something that 99% of all programmers have to worry about. The default works fine.

The only advantage of large pages is that you save a few TLB entries, and unless you're writing something like a relational database of the kind that e.g. Oracle is selling, there is absolutely no measurable difference for you.

 

 

600 threads might be a lot on a x86 desktop PC, but not all computers are 4-8 core PCs these days. (There are several embedded CPU architectures that feature far higher core counts)

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0