using static initialisation for paralelization?

Started by
26 comments, last by Hodgman 9 years, 10 months ago

it would be nice to know what really is putted there just some critical
section around this function? why it slows so much if ti is only one run called?


The initialisation code is only called once but the 'is initialised' flag must be checked every time the function is run; depending on the compile this could involve locking a mutex pre-check which is not a cheap operation.

but i use tens of itis isinitialised flags in every program and i got no slowdovn in case of them.. does c++ standard just mean that if i just use any static variable it would guard it all with locks? hell no, i hope

Advertisement


how is the mechanizm it slowed my program and growed it +50KB

I suspect, without proof, that it pulled in some library code to wrap and serialize the initialization of the function-local statics, and the slowdown is because that code gets executed. Thread serialization generally requires context switches and pipeline stalls. Without knowing the code, I suspect that code path needs to be executed every time so it can check to see if the object has been initialized.

i neverd heard the 'serialization' word in such sense (serialisation usually was meant saving some kind of data to disk), though this meaning is quite usable

if so doesnt it mean that winapi functions are "serialised" (are the winapi functions serialised?) i use a couple of winapi functions calls in each frame, i even measured a time for some of them and they were quick, i dont remember what it was but something like GetDC and some like that and all they were quick (microseconds maybe 100 microseconds, this scale, at most)

but i use tens of itis isinitialised flags in every program and i got no slowdovn in case of them.. does c++ standard just mean that if i just use any static variable it would guard it all with locks? hell no, i hope


C++11 changed the required behavior here. Some compilers support much of C++11 but not this feature. Other compilers have compile options to turn it off.

Instead of guessing what the compiler is doing, _look at the assembly output_. I can't stress this enough. Real engineers delve into how the boxes they build off of are constructed.

Consider:

#include <stdlib.h>

int foo() {
  static int bar = rand();
}
On GCC 4.9 with full optimizations, this produces:

foo():
	cmp	BYTE PTR guard variable for foo()::bar[rip], 0
	je	.L2
	mov	eax, DWORD PTR foo()::bar[rip]
	ret
.L2:
	sub	rsp, 24
	mov	edi, OFFSET FLAT:guard variable for foo()::bar
	call	__cxa_guard_acquire
	test	eax, eax
	jne	.L4
	mov	eax, DWORD PTR foo()::bar[rip]
	add	rsp, 24
	ret
.L4:
	call	rand
	mov	edi, OFFSET FLAT:guard variable for foo()::bar
	mov	DWORD PTR [rsp+12], eax
	mov	DWORD PTR foo()::bar[rip], eax
	call	__cxa_guard_release
	mov	eax, DWORD PTR [rsp+12]
	add	rsp, 24
	ret
It won't take a lock every single time, but it does check a global boolean. The gist is something like:

if not initialized
  lock
  if not initialized
    set initial value
    initialied = true
  end if
  unlock
end if
C++11 only requires that function-scope static initialization is thread-safe, so different compilers or different runtimes may implement this less efficiently.

Note that this only applies to initialization of function-local static variables (to non-zero values). The following bit of code can have the lock optimized away with no non-standard effects:

#include <stdlib.h>

bool foo() {
  static bool bar = false;
  if (!bar)
    bar = rand() == 0;
  return bar;
}
Compiles to:

foo():
	movzx	eax, BYTE PTR foo()::bar[rip]
	test	al, al
	je	.L7
	ret
.L7:
	sub	rsp, 8
	call	rand
	test	eax, eax
	sete	al
	mov	BYTE PTR foo()::bar[rip], al
	add	rsp, 8
	ret

Sean Middleditch – Game Systems Engineer – Join my team!

anyway this is a pitfal trap for me putting something slowing and bloating my program implicitely

there should be a large text * * * WARNING POSSIBLE CODE SLOWDOWN (reasone here) * * *


Yes, it sort of goes against the C++ philosophy of "pay only for what you use." Could be argued, however, that you're using function-local static variables so you're paying the price. That argument is getting kind of sketchy, though, because it can be countered with "but I'm not using multiple threads, so why should I pay the price?"

Be aware of letting a committee near anything, even for a minute.

i neverd heard the 'serialization' word in such sense (serialisation usually was meant saving some kind of data to disk), though this meaning is quite usable

Yes, I've run into that before. A lot of people use 'serialization' to mean streaming data, a synonym for 'marshalling'. I understand Java used that in its docs and it took off from there. Perhaps it originated from the act of sending data over a serial port (RS-232C) although we always used the term 'transmit' for that (and 'write to disk' for saving to disk, maybe 'save in text format' to be more explicit).

I'm using 'serialization' in its original meaning: enforce the serial operation of something that could potentially be performed in parallel or in simultaneous order. The usage predates the Java language and so do I. I apologize for the confusion. If any can suggest a better term, I'm open to suggestions.

Stephen M. Webb
Professional Free Software Developer

I'm using 'serialization' in its original meaning: enforce the serial operation of something that could potentially be performed in parallel or in simultaneous order. The usage predates the Java language and so do I. I apologize for the confusion. If any can suggest a better term, I'm open to suggestions.

"Synchronization" is the term I am most familiar with for that, along with the associated notions of synchronous and asynchronous execution.

"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke

"Synchronization" is the term I am most familiar with for that, along with the associated notions of synchronous and asynchronous execution.

To my ear, serialisation implies one-at-a-time execution, while synchronisation does not exclude batches of N-at-a-time.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

No. It did it so if you are initializing static-local functions in a multi-threaded environment, it will have defined results.

It certainly does not affect namespace-level static initialization, not does it imply anything about the introduction of a threaded environment during application initialization.

i do not understood (partialt through weak english)

so will this maybe work?

void main()

{

static int one = f1(); // potetntialy referencing and updating other data

static int two = f2();

static int three = f3();

static int four = f4();

printf("\n done.");
}

No, the so-called "magic statics" in C++11 don't cause a new thread to be spawned, they just wrap it in a mutex or similar so that another thread that simultaneously called the same function don't clobber each other.

And in any case, the code in f1, f2, f3, and f4 would still need to be written in a thread-safe way, so that they aren't clobbering each other.

No free lunch here.

throw table_exception("(? ???)? ? ???");

Yes, I've run into that before. A lot of people use 'serialization' to mean streaming data, a synonym for 'marshalling'. I understand Java used that in its docs and it took off from there. Perhaps it originated from the act of sending data over a serial port (RS-232C) although we always used the term 'transmit' for that (and 'write to disk' for saving to disk, maybe 'save in text format' to be more explicit).


I'm using 'serialization' in its original meaning: enforce the serial operation of something that could potentially be performed in parallel or in simultaneous order. The usage predates the Java language and so do I. I apologize for the confusion. If any can suggest a better term, I'm open to suggestions.

serialisation (for making possibly colliding cals serial) is quite good term

serial and parallel are somewhat orthogonal terms and fits nicely, imo it should be used more


serial and parallel are somewhat orthogonal terms and fits nicely, imo it should be used more

Actually they are opposite terms (antonyms of each other). Orthogonal would mean they are unrelated (in some sense, that their meanings go in completely different directions) smile.png

There's also e.g. "sequential" and "concurrent" though some people may use them with subtly different meanings, especially the latter, depending on the task that is actually being done. Anyway it's pretty clear from context what is meant in general.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

On GCC 4.9 with full optimizations, this produces:


foo():
	cmp	BYTE PTR guard variable for foo()::bar[rip], 0
	je	.L2
	mov	eax, DWORD PTR foo()::bar[rip]
	ret
.L2:
	sub	rsp, 24
	mov	edi, OFFSET FLAT:guard variable for foo()::bar
	call	__cxa_guard_acquire
	test	eax, eax
	jne	.L4
	mov	eax, DWORD PTR foo()::bar[rip]
	add	rsp, 24
	ret
.L4:
	call	rand
	mov	edi, OFFSET FLAT:guard variable for foo()::bar
	mov	DWORD PTR [rsp+12], eax
	mov	DWORD PTR foo()::bar[rip], eax
	call	__cxa_guard_release
	mov	eax, DWORD PTR [rsp+12]
	add	rsp, 24
	ret
It won't take a lock every single time, but it does check a global boolean. The gist is something like:

if not initialized
  lock
  if not initialized
    set initial value
    initialied = true
  end if
  unlock
end if

I don't mean to second guess the GCC authors here, but isn't that the "double checked locking" anti-pattern?

What if the CPU reorders the first two reads, as it is allowed to do...? [edit] my mistake - x86 isn't allowed to reorder reads with respect to each other [/edit]


second:
	cmp	BYTE PTR guard variable for foo()::bar[rip], 0
	je	.L2
First:
	mov	eax, DWORD PTR

This topic is closed to new replies.

Advertisement