• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
fir

using static initialisation for paralelization?

27 posts in this topic

Months a go i found somthing like 'bug' in my code it is when  turning

 

static DWORD processID = 0;
 processID = GetCurrentProcessId();

 

into 

 

static DWORD processID =  GetCurrentProcessId();

 

my exe size jumped from 25k to 75k and I also noticed 

slowdown of my game frame (from 3ms to 4ms) (though i still

do not know how it is possible (as this static initialization should

only appear once not each frame) and how to understand/explain it,

is this some cache efect appearing when exe bigges its size, got no idea)

 

there was answer

 

samoth,

 


"One notable difference of whether you compile C or C++ is for example that C++ (at least C++11, but GCC since ca. version 4.6 also does that for C++03) mandates initialization of function-local static variables being thread-safe. I'm mentioning that since you use that feature (maybe without being aware)."

 

if so I do understand that c++ language did that to can do static initializations 

multithreading so i mean such kode like this

 

static int one = f1();    // potetntialy referencing and updating other data

static int two = f2();

static int three = f3();

static int four = f4();

 
void main() { printf("\n done."); }
 

can/will execute on many cores in parralel ? will main then wait untill the last one of initializing functions end it work?  so it potentially could be used for 

cheap form of multithreading?

 

 

 

0

Share this post


Link to post
Share on other sites

No.  It did it so if you are initializing static-local functions in a multi-threaded environment, it will have defined results.

 

It certainly does not affect namespace-level static initialization, not does it imply anything about the introduction of a threaded environment during application initialization.

i do not understood (partialt through weak english)

 

so will this maybe work?

 

void main()

{

static int one = f1();    // potetntialy referencing and updating other data

static int two = f2();

static int three = f3();

static int four = f4();

 
 printf("\n done.");
}
 
 

if this is not done to allow some parallel works what it is for? 

and how is the mechanizm it slowed my program and growed it +50KB

size

0

Share this post


Link to post
Share on other sites

 

if this is not done to allow some parallel works what it is for? 
and how is the mechanizm it slowed my program and growed it +50KB
size


No, that will not do anything in parallel either.
It is there so that when you call functions from threads the static data is setup in a thread safe manner.
In order to ensure this the runtime will have to add some code/mutex locks to make sure two threads can't try to initialise at the same time.

[source]
void foo()
{
static int one = f1(); // potetntialy referencing and updating other data
static int two = f2();
static int three = f3();
static int four = f4();
}

int main()
{
parallel_invoke(foo, foo); // some function which will run supplied functions on multiple threads
printf("\n done.");
}
[/source]

So, in the above code 'foo' is invoked on two threads via the helper function. In C++03 the static initialisation of the function's variables would not be safe as both threads might end up running the setup functions. In C++11 only one thread would.

 

I do not understand, iznt the static data initialized at startup (pre-main) time?

 

If you call it from 2 threads  this statics are shared  but if it was initialized before in pre mian time so it cannot be collision on threads initialization of this (as they do not nitialize it, its already initialized) (?)

 

Im not quite sure as i was doing a little multithreading in my life but I understand that static data are unsafe at all so you nead thread local

storage (?)

0

Share this post


Link to post
Share on other sites

 


how is the mechanizm it slowed my program and growed it +50KB

I suspect, without proof, that it pulled in some library code to wrap and serialize the initialization of the function-local statics, and the slowdown is because that code gets executed.  Thread serialization generally requires context switches and pipeline stalls.  Without knowing the code, I suspect that code path needs to be executed every time so it can check to see if the object has been initialized.

 

 

 

I can provide the code if you want

 
#define WIN32_LEAN_AND_MEAN
#define WIN32_EXTRA_LEAN
#include <windows.h>
 
#include <psapi.h>
 
#include "..\allmyheaders\allmyheaders.h" //that was temporary skip it
 

long working_set_size = -1;
long paged_pool_size = -1;
long nonpaged_pool_size = -1;
 
static PROCESS_MEMORY_COUNTERS process_memory_counters;
 
long GetMemoryInfo()
{
 
    static HANDLE hProcess;
 
    static int initialised = 0;
    if(!initialised)
    {
     static DWORD processID = GetCurrentProcessId(); // <---THIS CRITICAL LINE
 
     hProcess = OpenProcess(  PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, processID );
 
     if (NULL == hProcess)  ERROR_EXIT("cant open process for get memory info");
 
     initialised = 1;
 
 
    }
 
 
    if(GetProcessMemoryInfo(hProcess, &process_memory_counters, sizeof(process_memory_counters)))
    {
      working_set_size    = process_memory_counters.WorkingSetSize;
      paged_pool_size     = process_memory_counters.QuotaPagedPoolUsage;
      nonpaged_pool_size  = process_memory_counters.QuotaNonPagedPoolUsage ;
    }
 
    if(0)
      CloseHandle( hProcess );
 
    return working_set_size;
}
 

damn those bug that cutts the rest of the post after /code tag

(eated a 20 lines of text)

 

this is code for measuring memmory consumption in my window program - it is called each frame but ini block is called only once as you see (some technique with 'initialized' flag of my own 'invention' (though probably other people use it too as its fine )

 

understanding what reaseo id for this and what is realli going on under the hood would be very velocome as it is some slowdown and exe bloat pitfall  for me (its obviously not needed in my case as i say breaking those line on two makes execution faster and exe is smaller)

Edited by fir
0

Share this post


Link to post
Share on other sites

No, function-scope statics are initialized only when the function s executed. Imaging there's a bool that's initialized before main, which is used in an "if !initialized" in every function call.

 

if so alrighte, so i understand the option for possible collistion

 

anyway this is a pitfal trap for me putting something slowing and bloating my program implicitely 

 

there should be a large text * * * WARNING POSSIBLE CODE SLOWDOWN (reasone here) * * *

 

it would be nice to know what really is putted there just some critical 

section around this function? why it slows so much if ti is only one run called?

 

does mingw compiler has more such slowdown pitfals (Im compiling pure c winapi codes trying to be very carefull against any slowdown

in my generated code - it would be very important to me to be sure

that i got all such overhead slowdowns avoided)

0

Share this post


Link to post
Share on other sites

 

it would be nice to know what really is putted there just some critical 
section around this function? why it slows so much if ti is only one run called?


The initialisation code is only called once but the 'is initialised' flag must be checked every time the function is run; depending on the compile this could involve locking a mutex pre-check which is not a cheap operation.

 

but i use tens of itis isinitialised flags in every program and i got no slowdovn in case of them.. does c++ standard just mean that if i just use any static variable it would guard it all with locks? hell no, i hope

0

Share this post


Link to post
Share on other sites

 


how is the mechanizm it slowed my program and growed it +50KB

I suspect, without proof, that it pulled in some library code to wrap and serialize the initialization of the function-local statics, and the slowdown is because that code gets executed.  Thread serialization generally requires context switches and pipeline stalls.  Without knowing the code, I suspect that code path needs to be executed every time so it can check to see if the object has been initialized.

 

 

i neverd heard the 'serialization' word in such sense (serialisation usually was meant saving some kind of data to disk), though this meaning is quite usable

 

if so doesnt it mean that winapi functions are "serialised" (are the winapi functions serialised?) i use a couple of winapi functions calls in each frame, i even measured a time for some of them and they were quick, i dont remember what it was but something like GetDC and some like that and all they were quick (microseconds maybe 100 microseconds, this scale, at most)

Edited by fir
0

Share this post


Link to post
Share on other sites

anyway this is a pitfal trap for me putting something slowing and bloating my program implicitely 
 
there should be a large text * * * WARNING POSSIBLE CODE SLOWDOWN (reasone here) * * *

 
 Yes, it sort of goes against the C++ philosophy of "pay only for what you use."  Could be argued, however, that you're using function-local static variables so you're paying the price.  That argument is getting kind of sketchy, though, because it can be countered with "but I'm not using multiple threads, so why should I pay the price?"

Be aware of letting a committee near anything, even for a minute.
 

i neverd heard the 'serialization' word in such sense (serialisation usually was meant saving some kind of data to disk), though this meaning is quite usable

Yes, I've run into that before. A lot of people use 'serialization' to mean streaming data, a synonym for 'marshalling'. I understand Java used that in its docs and it took off from there. Perhaps it originated from the act of sending data over a serial port (RS-232C) although we always used the term 'transmit' for that (and 'write to disk' for saving to disk, maybe 'save in text format' to be more explicit).

I'm using 'serialization' in its original meaning: enforce the serial operation of something that could potentially be performed in parallel or in simultaneous order. The usage predates the Java language and so do I. I apologize for the confusion. If any can suggest a better term, I'm open to suggestions.
2

Share this post


Link to post
Share on other sites
I'm using 'serialization' in its original meaning: enforce the serial operation of something that could potentially be performed in parallel or in simultaneous order. The usage predates the Java language and so do I. I apologize for the confusion. If any can suggest a better term, I'm open to suggestions.

"Synchronization" is the term I am most familiar with for that, along with the associated notions of synchronous and asynchronous execution.

0

Share this post


Link to post
Share on other sites

"Synchronization" is the term I am most familiar with for that, along with the associated notions of synchronous and asynchronous execution.

To my ear, serialisation implies one-at-a-time execution, while synchronisation does not exclude batches of N-at-a-time.
0

Share this post


Link to post
Share on other sites

 

No.  It did it so if you are initializing static-local functions in a multi-threaded environment, it will have defined results.

 

It certainly does not affect namespace-level static initialization, not does it imply anything about the introduction of a threaded environment during application initialization.

i do not understood (partialt through weak english)

 

so will this maybe work?

 

void main()

{

static int one = f1();    // potetntialy referencing and updating other data

static int two = f2();

static int three = f3();

static int four = f4();

 
 printf("\n done.");
}

 

 

No, the so-called "magic statics" in C++11 don't cause a new thread to be spawned, they just wrap it in a mutex or similar so that another thread that simultaneously called the same function don't clobber each other.

 

And in any case, the code in f1, f2, f3, and f4 would still need to be written in a thread-safe way, so that they aren't clobbering each other.

 

No free lunch here.

2

Share this post


Link to post
Share on other sites

Yes, I've run into that before. A lot of people use 'serialization' to mean streaming data, a synonym for 'marshalling'. I understand Java used that in its docs and it took off from there. Perhaps it originated from the act of sending data over a serial port (RS-232C) although we always used the term 'transmit' for that (and 'write to disk' for saving to disk, maybe 'save in text format' to be more explicit).


I'm using 'serialization' in its original meaning: enforce the serial operation of something that could potentially be performed in parallel or in simultaneous order. The usage predates the Java language and so do I. I apologize for the confusion. If any can suggest a better term, I'm open to suggestions.

 

 

serialisation (for making possibly colliding cals serial) is quite good term 

serial and parallel are somewhat orthogonal terms and fits nicely, imo it should be used more

0

Share this post


Link to post
Share on other sites


serial and parallel are somewhat orthogonal terms and fits nicely, imo it should be used more

 

Actually they are opposite terms (antonyms of each other). Orthogonal would mean they are unrelated (in some sense, that their meanings go in completely different directions) smile.png

 

There's also e.g. "sequential" and "concurrent" though some people may use them with subtly different meanings, especially the latter, depending on the task that is actually being done. Anyway it's pretty clear from context what is meant in general.

0

Share this post


Link to post
Share on other sites

On GCC 4.9 with full optimizations, this produces: 

foo():
	cmp	BYTE PTR guard variable for foo()::bar[rip], 0
	je	.L2
	mov	eax, DWORD PTR foo()::bar[rip]
	ret
.L2:
	sub	rsp, 24
	mov	edi, OFFSET FLAT:guard variable for foo()::bar
	call	__cxa_guard_acquire
	test	eax, eax
	jne	.L4
	mov	eax, DWORD PTR foo()::bar[rip]
	add	rsp, 24
	ret
.L4:
	call	rand
	mov	edi, OFFSET FLAT:guard variable for foo()::bar
	mov	DWORD PTR [rsp+12], eax
	mov	DWORD PTR foo()::bar[rip], eax
	call	__cxa_guard_release
	mov	eax, DWORD PTR [rsp+12]
	add	rsp, 24
	ret
It won't take a lock every single time, but it does check a global boolean. The gist is something like: 
if not initialized
  lock
  if not initialized
    set initial value
    initialied = true
  end if
  unlock
end if

I don't mean to second guess the GCC authors here, but isn't that the "double checked locking" anti-pattern?

What if the CPU reorders the first two reads, as it is allowed to do...? [edit] my mistake - x86 isn't allowed to reorder reads with respect to each other [/edit]

second:
	cmp	BYTE PTR guard variable for foo()::bar[rip], 0
	je	.L2
First:
	mov	eax, DWORD PTR
Edited by Hodgman
2

Share this post


Link to post
Share on other sites

What if the CPU reorders the first two reads, as it is allowed to do...?


Is an x86 processor allowed to reorder reads?
0

Share this post


Link to post
Share on other sites
IIRC, x86 memory model is strongly ordered, so machine code generated as such for x86 is safe, I believe. For weakly-ordered architectures like ARM, equivalent machine code would not be safe without adding memory fences.

In high-level code, double-checked locking is an anti pattern because, I believe, its non-portable for this very reason.

On x86 you can even (ab)use 'volatile' to achieve a kind of threading synchronization. This works fine on strongly-ordered systems, but again fails for weakly-orders ones. Tons of Windows (1st and 3rd party) code relied on this, to the extent that the Microsoft compiler has a few switches to toggle how volatile behaves after C++11 because their historical implementation intersected with strongly-ordered systems in a way that was more strict than necessary, and to aid porting of legacy windows code to run on ARM. Edited by Ravyne
0

Share this post


Link to post
Share on other sites

don't mean to second guess the GCC authors here, but isn't that the "double checked locking" anti-pattern?

 

 

 


In high-level code, double-checked locking is an anti pattern because, I believe, its non-portable for this very reason.

 

After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11.

Edited by Chris_F
0

Share this post


Link to post
Share on other sites

After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11


Right, if you *assume* that your high-level double-checked locking pattern code will never be compiled for a weakly-ordered system, it should work. But of coarse the trouble with high-level code is that any fool can unknowingly do just that, and then be subjected to strange and intermittent bugs. That's why its an anti-pattern.
0

Share this post


Link to post
Share on other sites

 

After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11


Right, if you *assume* that your high-level double-checked locking pattern code will never be compiled for a weakly-ordered system, it should work. But of coarse the trouble with high-level code is that any fool can unknowingly do just that, and then be subjected to strange and intermittent bugs. That's why its an anti-pattern.

 

 

No, C++11 offers portable high-level double-checked locking as described here: http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0