when to use concurrency in video games

Started by
17 comments, last by Syntac_ 7 years, 11 months ago

I think one key point is being glossed over in regards to the when portion of the question... When do you use concurrency? "Only when you need it to ship!" I have shipped many games where I've added threading engines but only bothered to port bits and pieces of code to the parallel models to hit a solid 60FPS with a little headroom. I just wanted to point this out as it seemed to be getting glossed over in the 'shiny' reasons you do concurrency.. :)

Advertisement
Correctness as in contex as thread save. Avoiding deadlocks or share data that depend on order it prosces from few treads.
If it save it correct as no deadlock crash or corrupted data, but not parrallel as if you trow a tradfic light on place where many lanes garther.

My guess is to use parralism to the max the software solution needs to be designed with parralism in mind not only thread save. Making computation as non dependant and avoid shared data. Avoiding deadlock by design and critical section stuf.

If you use volatile for multi-threading, you're doing it wrong.

That isn't necessarily true, granted using it everywhere or for everything which is shared between two -or more- threads would be morally reprehensible.
If all you need to do is read some memory, and you don't truly care if SNAFU is the word of the day.. then feel free to use volatile.

I used volatile on certain data members I wanted to draw to screen as text. Health, ammo, score. Single values which get updated *maybe* once a frame, so if a couple frames got fubar'd I didn't care. In practice, every single frame was A-OK. Perhaps my loads weren't heavy enough.

Volatile does absolutely* nothing for multi-threading. You could leave those values as non-volatile and it would act the same way.

*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix, which acts as a SMP memory fence, which forces them to occur in order [edit]specified as having acquire/release semantics[/edit] - but that compiler behavior is frankly, wrong :P

In general, volatile does not ensure that reads/writes are atomic, or that reads/writes occur in order with respect to the rest of the code. If you see that keyword in multi-threaded code (perhaps except for inside the implementation details of std::atomic), then you have a bug.

See also: https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

Volatile does absolutely* nothing for multi-threading. You could leave those values as non-volatile and it would act the same way.

*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix, which acts as a SMP memory fence, which forces them to occur in order - but that compiler behavior is frankly, wrong :P
In general, volatile does not ensure that reads/writes are atomic, or that reads/writes occur in order with respect to the rest of the code. If you see that keyword in multi-threaded code (perhaps except for inside the implementation details of std::atomic), then you have a bug.

See also: https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

Interesting. I did implement using vs2008 and 2013, I wonder if I had gotten a suggestion from the compiler. Knowing myself I'd have tried compiling without any qualifier to see if I could. After seeing your post though I looked up volatile again -from the internet this time (a book previously)- and I was surprised to find that volatile is actually intended to prevent compiler optimizations from taking place involving variables specified with volatile. On one reference it actually states its behaviour has the effect of: "This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution"

You think you know something from ages ago, then you go and find out you don't. Go figure.

I'd like to note though, I didn't have any such expectations of volatile. I thought it was letting me read memory that *may* be in the midst of being written to.

*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix

MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.

*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix


MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.

Yeah, you're right - I've struck that bit out in my above post. From memory I thought that they'd gone as far as basically using their Interlocked* intrinsics silently on volatile integers, but it's a lot weaker than that. I even just gave it a go in my compiler and couldn't get it to emit a LOCK prefix except when calling InterlockedCompareExchange/InterlockedIncrement manually :)

This means that even with MS's stricter form of volatile, it would be very hard to use them to write correct inter-thread synchronization (i.e. you should still only see them deep in the guts of synchronization primitives, and not in user code).

*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix


MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.

Yeah, you're right - I've struck that bit out in my above post. From memory I thought that they'd gone as far as basically using their Interlocked* intrinsics silently on volatile integers, but it's a lot weaker than that. I even just gave it a go in my compiler and couldn't get it to emit a LOCK prefix except when calling InterlockedCompareExchange/InterlockedIncrement manually :)

This means that even with MS's stricter form of volatile, it would be very hard to use them to write correct inter-thread synchronization (i.e. you should still only see them deep in the guts of synchronization primitives, and not in user code).

As a general note involving the volatiles, I also went and did a test for fun. I took the scheduler for my distribution system and added a single volatile to the head index of the lazy ring buffer. I changed nothing else, I'm still using explicit atomic load/store to access it. It slowed down the loop by about 10%. That's quite a bit worse than my worst guess. This was on a dual Xeon and compiled by Clang, I'd be terrified to see what happens with the MS hackery on volatiles. As a note: there is an option in VC2015 to disable the MS specific behavior now I believe, so it may not be any worse than Clang with that set.

As to volatiles and threading in general, I don't believe I use the keyword volatile anywhere in my code, both home and work, and it is fully multi-core from the ground up. Unlike what I called out above, I'm not using it just to ship, it is a fundamental design goal of the overall architecture.

Multi-threading is more complicated than it looks in the beginning. Taking care about synchronization, preventing deadlocks, race conditions etc. It's not a trivial thing. That's why for simple project it may not be needed and if used - may be huge source of bugs which will be hard to debug ( especially for not that skilled programmers ). Some games simply don't need this level of complexity because they are to simple to gain any benefits out of it.
Today parallel programming becomes more popular because of the nature of modern hardware. If you have many CPUs, the very last thing you want them to do is sitting there and doing nothing. Hence you want to give them as much work as possible. Free CPUs are just waste of precious resources.
You asked what could be parallelized in the game engine. Technically everything. All you need to do is to make sure your architecture is suitable for it. You may do it old-school way and create separate thread for rendering state and update state and while one frame is being updated, the previous is being rendered. Or you may go modern way and build your engine around task-based architecture where everything is a task that can be done in parallel with other tasks from same group/family etc. The latter way is better if you think of vertical scalability. If your system has 4 cores - all 4 cores will stay occupied until your tasks are done. 8 cores - same thing, just executed faster. The engine will just scale to the architecture and will spawn proper number of parallel tasks. It's really up to you what you want to parallelize. It comes together with taking more care about your data, so two tasks won't work on the same data in the same time. It requires to build engine very carefully always having in mind potential pitfalls.
I recommend to watch this presentation from GDC about Destiny and how it utilizes multithreading:
It may answer some of yout questions :)

Here's a nice presentation from Naughty Dog about how they used multithreading to achieve 60FPS in Last of Us Remastered.

http://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine

This topic is closed to new replies.

Advertisement