Back to General and Gameplay Programming

using static initialisation for paralelization?

Obi Wan Kenobi · 2014-06-14T08:51:03

Months a go i found somthing like 'bug' in my code it is when turning static DWORD processID = 0; processID = GetCurrentProcessId(); into static DWORD processID = GetCurrentProcessId(); my exe size jumped from 25k to 75k and I also noticed slowdown of my game frame (from 3ms to 4ms) (though i still do not know how it is possible (as this static initialization should only appear once not each frame) and how to understand/explain it, is this some cache efect appearing when exe bigges its size, got no idea) there was answer samoth, "One notable difference of whether you compile C or C++ is for example that C++ (at least C++11, but GCC since ca. version 4.6 also does that for C++03) mandates initialization of function-local static variables being thread-safe. I'm mentioning that since you use that feature (maybe without being aware)." if so I do understand that c++ language did that to can do static initializations multithreading so i mean such kode like this static int one = f1(); // potetntialy referencing and updating other data static int two = f2(); static int three = f3(); static int four = f4(); void main() { printf("\n done."); } can/will execute on many cores in parralel ? will main then wait untill the last one of initializing functions end it work? so it potentially could be used for cheap form of multithreading?

General and Gameplay Programming Programming

Started by fir June 13, 2014 10:44 AM

26 comments, last by Hodgman 9 years, 10 months ago

Pink Horror

2,459

June 14, 2014 04:47 AM

What if the CPU reorders the first two reads, as it is allowed to do...?

Is an x86 processor allowed to reorder reads?

Ravyne

14,306

June 14, 2014 05:51 AM

IIRC, x86 memory model is strongly ordered, so machine code generated as such for x86 is safe, I believe. For weakly-ordered architectures like ARM, equivalent machine code would not be safe without adding memory fences.

In high-level code, double-checked locking is an anti pattern because, I believe, its non-portable for this very reason.

On x86 you can even (ab)use 'volatile' to achieve a kind of threading synchronization. This works fine on strongly-ordered systems, but again fails for weakly-orders ones. Tons of Windows (1st and 3rd party) code relied on this, to the extent that the Microsoft compiler has a few switches to toggle how volatile behaves after C++11 because their historical implementation intersected with strongly-ordered systems in a way that was more strict than necessary, and to aid porting of legacy windows code to run on ARM.

throw table_exception("(? ???)? ? ???");

Chris_F

3,030

June 14, 2014 05:51 AM

don't mean to second guess the GCC authors here, but isn't that the "double checked locking" anti-pattern?

In high-level code, double-checked locking is an anti pattern because, I believe, its non-portable for this very reason.

After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11.

Ravyne

14,306

June 14, 2014 06:06 AM

After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11

Right, if you *assume* that your high-level double-checked locking pattern code will never be compiled for a weakly-ordered system, it should work. But of coarse the trouble with high-level code is that any fool can unknowingly do just that, and then be subjected to strange and intermittent bugs. That's why its an anti-pattern.

throw table_exception("(? ???)? ? ???");

Chris_F

3,030

June 14, 2014 06:52 AM

After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11

Right, if you *assume* that your high-level double-checked locking pattern code will never be compiled for a weakly-ordered system, it should work. But of coarse the trouble with high-level code is that any fool can unknowingly do just that, and then be subjected to strange and intermittent bugs. That's why its an anti-pattern.

No, C++11 offers portable high-level double-checked locking as described here: http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

Ravyne

14,306

June 14, 2014 07:07 AM

Ah, right. I see. Despite best efforts, I still sometimes live in a hazy world of what C++ was, what C++11 is supposed to be, and what the various compilers actually are today. Its hard to keep it all straight.

I stand corrected.

However, for sake of the thread, its worth clarifying that the old, non-portable pattern that happened to work on strongly-ordered systems (that is, the naive 'it looks right if you're unaware of weakly-ordered systems' pattern) is still broken -- you *can* do portable double-checked locking in C++11, but you have to do it right.

throw table_exception("(? ???)? ? ???");

fir

-460

Author

June 14, 2014 07:24 AM

but i use tens of itis isinitialised flags in every program and i got no slowdovn in case of them.. does c++ standard just mean that if i just use any static variable it would guard it all with locks? hell no, i hope

C++11 changed the required behavior here. Some compilers support much of C++11 but not this feature. Other compilers have compile options to turn it off.

Instead of guessing what the compiler is doing, _look at the assembly output_. I can't stress this enough. Real engineers delve into how the boxes they build off of are constructed.

Consider:
#include <stdlib.h>

int foo() {
  static int bar = rand();
}
On GCC 4.9 with full optimizations, this produces:
foo():
	cmp	BYTE PTR guard variable for foo()::bar[rip], 0
	je	.L2
	mov	eax, DWORD PTR foo()::bar[rip]
	ret
.L2:
	sub	rsp, 24
	mov	edi, OFFSET FLAT:guard variable for foo()::bar
	call	__cxa_guard_acquire
	test	eax, eax
	jne	.L4
	mov	eax, DWORD PTR foo()::bar[rip]
	add	rsp, 24
	ret
.L4:
	call	rand
	mov	edi, OFFSET FLAT:guard variable for foo()::bar
	mov	DWORD PTR [rsp+12], eax
	mov	DWORD PTR foo()::bar[rip], eax
	call	__cxa_guard_release
	mov	eax, DWORD PTR [rsp+12]
	add	rsp, 24
	ret
It won't take a lock every single time, but it does check a global boolean. The gist is something like:
if not initialized
  lock
  if not initialized
    set initial value
    initialied = true
  end if
  unlock
end if
C++11 only requires that function-scope static initialization is thread-safe, so different compilers or different runtimes may implement this less efficiently.

Note that this only applies to initialization of function-local static variables (to non-zero values). The following bit of code can have the lock optimized away with no non-standard effects:
#include <stdlib.h>

bool foo() {
  static bool bar = false;
  if (!bar)
    bar = rand() == 0;
  return bar;
}
Compiles to:
foo():
	movzx	eax, BYTE PTR foo()::bar[rip]
	test	al, al
	je	.L7
	ret
.L7:
	sub	rsp, 8
	call	rand
	test	eax, eax
	sete	al
	mov	BYTE PTR foo()::bar[rip], al
	add	rsp, 8
	ret

allright, i seem to understand though i get a little trouble with this

I understand it just mean that this static calls are a handy method

for 'one called - functions' just the same what i'am symulating by hand often in my code

yet unpleazant thing that it is seralized implicitely (at least by default)

i would prefer a keyword serialise or something

void foo()

{

serialize static int f = f();

}

to hand controll it - (c++ goes wrong way (though it is no news), as i said i am working for years on my own c2 dialect that would mend some things)

after all it is still not clear what mak 50kb bloat of my app when

turning "static int f = 0; f = f();" into "static int f = f();" where in runtime

this lock should be touched only once

is it possible that when finding this line compiler turns some mode of compilling application into some multithreading mode and puts more

locks in over other parts of my code or what?

or is this bloat from incompiling some code for this mt suppord in the background of my binary? and slowdown comes indirectly from bloat?

Hodgman

52,717

June 14, 2014 08:51 AM

What if the CPU reorders the first two reads, as it is allowed to do...?

Is an x86 processor allowed to reorder reads?

x86 includes an LFENCE instruction, which tells the CPU explicitly NOT to reorder reads past other reads, so I assumed so...

But... the spec says "Reads are not reordered with other reads"... So I guess the point of LFENCE is just to ensure that a read is not moved earlier such that it might occur out of order some particular write (which itself might be constrained from being moved too with an SFENCE)?

. 22 Racing Series .

using static initialisation for paralelization?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

using static initialisation for paralelization?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines