I think one key point is being glossed over in regards to the when portion of the question... When do you use concurrency? "Only when you need it to ship!" I have shipped many games where I've added threading engines but only bothered to port bits and pieces of code to the parallel models to hit a solid 60FPS with a little headroom. I just wanted to point this out as it seemed to be getting glossed over in the 'shiny' reasons you do concurrency.. :)
when to use concurrency in video games
If it save it correct as no deadlock crash or corrupted data, but not parrallel as if you trow a tradfic light on place where many lanes garther.
My guess is to use parralism to the max the software solution needs to be designed with parralism in mind not only thread save. Making computation as non dependant and avoid shared data. Avoiding deadlock by design and critical section stuf.
That isn't necessarily true, granted using it everywhere or for everything which is shared between two -or more- threads would be morally reprehensible.If you use volatile for multi-threading, you're doing it wrong.
If all you need to do is read some memory, and you don't truly care if SNAFU is the word of the day.. then feel free to use volatile.
I used volatile on certain data members I wanted to draw to screen as text. Health, ammo, score. Single values which get updated *maybe* once a frame, so if a couple frames got fubar'd I didn't care. In practice, every single frame was A-OK. Perhaps my loads weren't heavy enough.
Volatile does absolutely* nothing for multi-threading. You could leave those values as non-volatile and it would act the same way.
*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix, which acts as a SMP memory fence, which forces them to occur in order [edit]specified as having acquire/release semantics[/edit] - but that compiler behavior is frankly, wrong :P
In general, volatile does not ensure that reads/writes are atomic, or that reads/writes occur in order with respect to the rest of the code. If you see that keyword in multi-threaded code (perhaps except for inside the implementation details of std::atomic), then you have a bug.
See also: https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
Volatile does absolutely* nothing for multi-threading. You could leave those values as non-volatile and it would act the same way.
*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix, which acts as a SMP memory fence, which forces them to occur in order - but that compiler behavior is frankly, wrong :P
In general, volatile does not ensure that reads/writes are atomic, or that reads/writes occur in order with respect to the rest of the code. If you see that keyword in multi-threaded code (perhaps except for inside the implementation details of std::atomic), then you have a bug.See also: https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
Interesting. I did implement using vs2008 and 2013, I wonder if I had gotten a suggestion from the compiler. Knowing myself I'd have tried compiling without any qualifier to see if I could. After seeing your post though I looked up volatile again -from the internet this time (a book previously)- and I was surprised to find that volatile is actually intended to prevent compiler optimizations from taking place involving variables specified with volatile. On one reference it actually states its behaviour has the effect of: "This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution"
You think you know something from ages ago, then you go and find out you don't. Go figure.
I'd like to note though, I didn't have any such expectations of volatile. I thought it was letting me read memory that *may* be in the midst of being written to.
*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix
MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.
*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix
MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.
Yeah, you're right - I've struck that bit out in my above post. From memory I thought that they'd gone as far as basically using their Interlocked* intrinsics silently on volatile integers, but it's a lot weaker than that. I even just gave it a go in my compiler and couldn't get it to emit a LOCK prefix except when calling InterlockedCompareExchange/InterlockedIncrement manually :)
This means that even with MS's stricter form of volatile, it would be very hard to use them to write correct inter-thread synchronization (i.e. you should still only see them deep in the guts of synchronization primitives, and not in user code).
*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix
MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.Yeah, you're right - I've struck that bit out in my above post. From memory I thought that they'd gone as far as basically using their Interlocked* intrinsics silently on volatile integers, but it's a lot weaker than that. I even just gave it a go in my compiler and couldn't get it to emit a LOCK prefix except when calling InterlockedCompareExchange/InterlockedIncrement manually :)
This means that even with MS's stricter form of volatile, it would be very hard to use them to write correct inter-thread synchronization (i.e. you should still only see them deep in the guts of synchronization primitives, and not in user code).
As a general note involving the volatiles, I also went and did a test for fun. I took the scheduler for my distribution system and added a single volatile to the head index of the lazy ring buffer. I changed nothing else, I'm still using explicit atomic load/store to access it. It slowed down the loop by about 10%. That's quite a bit worse than my worst guess. This was on a dual Xeon and compiled by Clang, I'd be terrified to see what happens with the MS hackery on volatiles. As a note: there is an option in VC2015 to disable the MS specific behavior now I believe, so it may not be any worse than Clang with that set.
As to volatiles and threading in general, I don't believe I use the keyword volatile anywhere in my code, both home and work, and it is fully multi-core from the ground up. Unlike what I called out above, I'm not using it just to ship, it is a fundamental design goal of the overall architecture.
Here's a nice presentation from Naughty Dog about how they used multithreading to achieve 60FPS in Last of Us Remastered.
http://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine