Back to General and Gameplay Programming

initialize std::thread

Bryan Wagstaff · 2018-02-11T01:08:11

Note that even though that isn't required, it is something that Microsoft has chosen to do beyond what the standard requires. It is non-standard behavior, but they do it because too many programs have misused the old behavior in bug-inducing ways. It adds a performance penalty so use with caution. As I mentioned in my earlier posts, old code that correctly relied on the behavior in decades past would often use object aliases for both a volatile version and a non-volatile version. Tricky to get right and disastrous to get wrong, it can eliminate some of the performance issues. Or, you can use tools recognized by the language and compiler for the correct behavior which is the better choice in modern code.

General and Gameplay Programming Programming

Started by _WeirdCat_ February 05, 2018 10:59 AM

24 comments, last by frob 6 years, 3 months ago

_WeirdCat_

1,512

Author

February 05, 2018 11:55 PM

That is not going to happen, sorry

Master & Mentor
https://sites.google.com/site/customprog/

khawk

3,752

February 06, 2018 02:16 AM

Small note on this.

Quote

but shouldn't listen_and_accept_finished (and Port) be volatile, since they will be modified outside the thread

That is correct, but FWIW, with C++11+ I've used std::atomic for situations where multiple threads are accessing simple data types. You can use the std::memory_order_* operations for further control.

Admin for GameDev.net.

frob

46,227

February 06, 2018 04:04 AM

Correct about std::atomic. Volatile doesn't do what most people think it does. The volatile keyword does drastically different things in different languages.

Java and C# use "volatile" to indicate objects that follow certain locking patterns, very similar to std::atomic<> in modern c++.

C and C++ use "volatile" to indicate that the optimizer should not make assumptions about the object's side effects, they should be treated exactly as the language abstract machine specifies.

In C and C++ volatile specifically means "do not optimize this value". On certain hardware and certain compilers and certain data types this can mean that memory barriers and cache coherency and other rules are followed, but they are not guaranteed and are not universal. Note in particular that using volatile like this eliminates many optimizations. The compiler is required to write to main memory and store to main memory instead of using the memory directly. This can mean that instead of a value requiring a partial nanosecond, or even being removed completely, the compiler is required to work with the memory directly, which can take tens or hundreds of nanoseconds.

Because so many programs on Windows incorrectly rely on the behavior, most compilers on the platform will perform the extra work if you're using an atomic hardware type like an int or char or long, but they won't automatically do more. That doesn't make the behavior right, it means that the bug of incorrectly using "volatile" when you mean "atomic" is so pervasive they added additional performance-harming behavior around the buggy usage.

The C and C++ version of volatile is sometimes erroneously mentioned regarding multithreading because of history. Without the keyword the compiler would see that no code actually writes to the memory, so it will assume the value is never actually used; the code can reuse the value as a compiler constant without ever reading or writing to memory. With the keyword the compiler is required to read or write the value directly to memory every time it is encountered. This meant volatile was perfect for specialized hardware where memory was shared between hardware or software systems. Since early multiprocessing systems would use this for sharing memory, they used volatile as the quick-and-dirty way to manipulate the memory when it was shared, and would cast between volatile and non-volatile versions in ways that worked correctly on that specific system.

On the PC that environment hasn't existed for over two decades. Use the proper atomic values because the compiler can optimize them in amazing ways. Don't use volatile for multithreading, since it is almost certainly not doing what you expect.

matt77hias

560

February 06, 2018 06:17 PM

14 hours ago, frob said:

Volatile doesn't do what most people think it does.

Well it does what my intent was: not caching the value. Though, you're right that atomics will result in far superior performance. For non-primitive types, (mutex or even spin) locking is probably faster than volatile as well.

🧙

King Mir

2,506

February 08, 2018 07:09 AM

On 2/6/2018 at 1:17 PM, matt77hias said:

Well it does what my intent was: not caching the value. Though, you're right that atomics will result in far superior performance. For non-primitive types, (mutex or even spin) locking is probably faster than volatile as well.

Volatile is both too much and not enough for multi-threaded synchronization.

Volatile prevents the compiler from putting a value in a register, and it makes the compiler treat reads and writes as IO for ordering purposes. But what it does not to is provide memory fences to prevent the processor from reordering synchronization instructions. Therefore, using atomics (with sequentially consistent memory order) will actually generate different assembly then using volatile. On weaker processor architectures, the difference is more pronounced, but even x64 now has Non-Temporal SSE instructions, which need more fences.

And technically, any program that has a shared data not synchronized by threading primitives (which volatile isn't), is undefined.

matt77hias

560

February 08, 2018 09:37 AM

2 hours ago, King Mir said:

... and not enough for multi-threaded synchronization.

This why Microsoft has these InterlockedX functions (e.g. InterlockAdd), isn't?

Though, these functions are said to be atomic, the values are still volatile. Doesn't the latter always impose a bottleneck?

🧙

King Mir

2,506

February 08, 2018 11:32 PM

Yes, it does look like InterlockedAdd would get you the rest of the way, based on a quick look. But why wouldn't you just use std::atomic<T>? Especially if using std::thread.

Also, a complication here is that Microsoft's compiler by default implements volatile as atomic, adding those fences that the language does not require it to.

swiftcoder

18,997

February 08, 2018 11:53 PM

My hunch is that their compiler is smart enough to omit those fences if used in an actually atomic operation like InterlockAdd... But I don't have the wherewithal to prove that

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

King Mir

2,506

February 09, 2018 02:58 AM

3 hours ago, swiftcoder said:

My hunch is that their compiler is smart enough to omit those fences if used in an actually atomic operation like InterlockAdd... But I don't have the wherewithal to prove that

That's backwards. The entire point of using atomic operations is so the compiler adds those fences.

I can't figure out from the linked documentation if InterlockAdd has sequentially consistent semantics, or merely acquire/release semantics, and therefore if it would require a locked instruction, or not on x86. A locked instruction would be sufficient for sequentially consistent semantics on x86, so you don't need a fence.

Bregma

9,461

February 09, 2018 11:32 AM

8 hours ago, King Mir said:

I can't figure out from the linked documentation if InterlockAdd has sequentially consistent semantics, or merely acquire/release semantics, and therefore if it would require a locked instruction, or not on x86. A locked instruction would be sufficient for sequentially consistent semantics on x86, so you don't need a fence.

Unless of course the compiler decides to reorder the instructions in your code, or the CPU is superscalar and performs out-of-order and/or speculative execution, and you end up using the results of your interlocked add before actually calling that instruction: a locked processor instruction does not affect that but a fence does. Then again, you're using this in a multi-threaded application, so you're going to get pre-empted right after your InterlockAdd and the value gets changed by the other thread after you've obtained it, and you're left scratching your head why things don't work, sometimes, maybe.

Stephen M. Webb
Professional Free Software Developer

initialize std::thread

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

initialize std::thread

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines