What if the CPU reorders the first two reads, as it is allowed to do...?
Is an x86 processor allowed to reorder reads?
What if the CPU reorders the first two reads, as it is allowed to do...?
don't mean to second guess the GCC authors here, but isn't that the "double checked locking" anti-pattern?
In high-level code, double-checked locking is an anti pattern because, I believe, its non-portable for this very reason.
After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11.
After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11
After a quick search online I'm left with the impression that double-checked locking is not an issue for x86 or x86-64 and that it can be implemented safely (at a high level) in C++11
Right, if you *assume* that your high-level double-checked locking pattern code will never be compiled for a weakly-ordered system, it should work. But of coarse the trouble with high-level code is that any fool can unknowingly do just that, and then be subjected to strange and intermittent bugs. That's why its an anti-pattern.
No, C++11 offers portable high-level double-checked locking as described here: http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/
but i use tens of itis isinitialised flags in every program and i got no slowdovn in case of them.. does c++ standard just mean that if i just use any static variable it would guard it all with locks? hell no, i hope
C++11 changed the required behavior here. Some compilers support much of C++11 but not this feature. Other compilers have compile options to turn it off.
Instead of guessing what the compiler is doing, _look at the assembly output_. I can't stress this enough. Real engineers delve into how the boxes they build off of are constructed.
Consider:
On GCC 4.9 with full optimizations, this produces:#include <stdlib.h> int foo() { static int bar = rand(); }
It won't take a lock every single time, but it does check a global boolean. The gist is something like:foo(): cmp BYTE PTR guard variable for foo()::bar[rip], 0 je .L2 mov eax, DWORD PTR foo()::bar[rip] ret .L2: sub rsp, 24 mov edi, OFFSET FLAT:guard variable for foo()::bar call __cxa_guard_acquire test eax, eax jne .L4 mov eax, DWORD PTR foo()::bar[rip] add rsp, 24 ret .L4: call rand mov edi, OFFSET FLAT:guard variable for foo()::bar mov DWORD PTR [rsp+12], eax mov DWORD PTR foo()::bar[rip], eax call __cxa_guard_release mov eax, DWORD PTR [rsp+12] add rsp, 24 ret
C++11 only requires that function-scope static initialization is thread-safe, so different compilers or different runtimes may implement this less efficiently.if not initialized lock if not initialized set initial value initialied = true end if unlock end if
Note that this only applies to initialization of function-local static variables (to non-zero values). The following bit of code can have the lock optimized away with no non-standard effects:
Compiles to:#include <stdlib.h> bool foo() { static bool bar = false; if (!bar) bar = rand() == 0; return bar; }
foo(): movzx eax, BYTE PTR foo()::bar[rip] test al, al je .L7 ret .L7: sub rsp, 8 call rand test eax, eax sete al mov BYTE PTR foo()::bar[rip], al add rsp, 8 ret
allright, i seem to understand though i get a little trouble with this
I understand it just mean that this static calls are a handy method
for 'one called - functions' just the same what i'am symulating by hand often in my code
yet unpleazant thing that it is seralized implicitely (at least by default)
i would prefer a keyword serialise or something
void foo()
{
serialize static int f = f();
}
to hand controll it - (c++ goes wrong way (though it is no news), as i said i am working for years on my own c2 dialect that would mend some things)
after all it is still not clear what mak 50kb bloat of my app when
turning "static int f = 0; f = f();" into "static int f = f();" where in runtime
this lock should be touched only once
is it possible that when finding this line compiler turns some mode of compilling application into some multithreading mode and puts more
locks in over other parts of my code or what?
or is this bloat from incompiling some code for this mt suppord in the background of my binary? and slowdown comes indirectly from bloat?
Is an x86 processor allowed to reorder reads?What if the CPU reorders the first two reads, as it is allowed to do...?
x86 includes an LFENCE instruction, which tells the CPU explicitly NOT to reorder reads past other reads, so I assumed so...
But... the spec says "Reads are not reordered with other reads"... So I guess the point of LFENCE is just to ensure that a read is not moved earlier such that it might occur out of order some particular write (which itself might be constrained from being moved too with an SFENCE)?