initialize std::thread

Started by
24 comments, last by frob 6 years, 3 months ago

That is not going to happen, sorry

Advertisement

Small note on this.

Quote

but shouldn't listen_and_accept_finished (and Port) be volatile, since they will be modified outside the thread

That is correct, but FWIW, with C++11+ I've used std::atomic for situations where multiple threads are accessing simple data types. You can use the std::memory_order_* operations for further control.

 

Admin for GameDev.net.

Correct about std::atomic.  Volatile doesn't do what most people think it does.  The volatile keyword does drastically different things in different languages. 

Java and C# use "volatile" to indicate objects that follow certain locking patterns, very similar to std::atomic<> in modern c++.

C and C++ use "volatile" to indicate that the optimizer should not make assumptions about the object's side effects, they should be treated exactly as the language abstract machine specifies. 

In C and C++ volatile specifically means "do not optimize this value".  On certain hardware and certain compilers and certain data types this can mean that memory barriers and cache coherency and other rules are followed, but they are not guaranteed and are not universal.   Note in particular that using volatile like this eliminates many optimizations.  The compiler is required to write to main memory and store to main memory instead of using the memory directly.   This can mean that instead of a value requiring a partial nanosecond, or even being removed completely, the compiler is required to work with the memory directly, which can take tens or hundreds of nanoseconds.

 

Because so many programs on Windows incorrectly rely on the behavior, most compilers on the platform will perform the extra work if you're using an atomic hardware type like an int or char or long, but they won't automatically do more.  That doesn't make the behavior right, it means that the bug of incorrectly using "volatile" when you mean "atomic" is so pervasive they added additional performance-harming behavior around the buggy usage.

The C and C++ version of volatile is sometimes erroneously mentioned regarding multithreading because of history. Without the keyword the compiler would see that no code actually writes to the memory, so it will assume the value is never actually used; the code can reuse the value as a compiler constant without ever reading or writing to memory. With the keyword the compiler is required to read or write the value directly to memory every time it is encountered.   This meant volatile was perfect for specialized hardware where memory was shared between hardware or software systems. Since early multiprocessing systems would use this for sharing memory, they used volatile as the quick-and-dirty way to manipulate the memory when it was shared, and would cast between volatile and non-volatile versions in ways that worked correctly on that specific system.

On the PC that environment hasn't existed for over two decades.  Use the proper atomic values because the compiler can optimize them in amazing ways. Don't use volatile for multithreading, since it is almost certainly not doing what you expect.

14 hours ago, frob said:

Volatile doesn't do what most people think it does.

Well it does what my intent was: not caching the value. Though, you're right that atomics will result in far superior performance. For non-primitive types, (mutex or even spin) locking is probably faster than volatile as well. :)

🧙

On 2/6/2018 at 1:17 PM, matt77hias said:

Well it does what my intent was: not caching the value. Though, you're right that atomics will result in far superior performance. For non-primitive types, (mutex or even spin) locking is probably faster than volatile as well. :)

Volatile is both too much and not enough for multi-threaded synchronization.

Volatile prevents the compiler from putting a value in a register, and it makes the compiler treat reads and writes as IO for ordering purposes. But what it does not to is provide memory fences to prevent the processor from reordering synchronization instructions. Therefore, using atomics (with sequentially consistent memory order) will actually generate different assembly then using volatile. On weaker processor architectures, the difference is more pronounced, but even x64 now has Non-Temporal SSE instructions, which need more fences.

And technically, any program that has a shared data not synchronized by threading primitives (which volatile isn't), is undefined.

2 hours ago, King Mir said:

... and not enough for multi-threaded synchronization.

This why Microsoft has these InterlockedX functions (e.g. InterlockAdd), isn't?

Though, these functions are said to be atomic, the values are still volatile. Doesn't the latter always impose a bottleneck?

🧙

Yes, it does look like InterlockedAdd would get you the rest of the way, based on a quick look. But why wouldn't you just use std::atomic<T>? Especially if using std::thread.

Also, a complication here is that Microsoft's compiler by default implements volatile as atomic, adding those fences that the language does not require it to.

My hunch is that their compiler is smart enough to omit those fences if used in an actually atomic operation like InterlockAdd... But I don't have the wherewithal to prove that :)

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

3 hours ago, swiftcoder said:

My hunch is that their compiler is smart enough to omit those fences if used in an actually atomic operation like InterlockAdd... But I don't have the wherewithal to prove that :)

That's backwards. The entire point of using atomic operations is so the compiler adds those fences.

I can't figure out from the linked documentation if InterlockAdd has sequentially consistent semantics, or merely acquire/release semantics, and therefore if it would require a locked instruction, or not on x86. A locked instruction would be sufficient for sequentially consistent semantics on x86, so you don't need a fence.

8 hours ago, King Mir said:

I can't figure out from the linked documentation if InterlockAdd has sequentially consistent semantics, or merely acquire/release semantics, and therefore if it would require a locked instruction, or not on x86. A locked instruction would be sufficient for sequentially consistent semantics on x86, so you don't need a fence.

Unless of course the compiler decides to reorder the instructions in your code, or the CPU is superscalar and performs out-of-order and/or speculative execution, and you end up using the results of your interlocked add before actually calling that instruction: a locked processor instruction does not affect that but a fence does.  Then again, you're using this in a multi-threaded application, so you're going to get pre-empted right after your InterlockAdd and the value gets changed by the other thread after you've obtained it, and you're left scratching your head why things don't work, sometimes, maybe.

 

Stephen M. Webb
Professional Free Software Developer

This topic is closed to new replies.

Advertisement