About using volatile, I thought that was necessary. I thought it was there to make sure any writes to a variable get written to memory immediately rather than allowing the optimiser to cache intermediate values in a register.
In some cases, it literally does nothing. In other cases, it doesn't do nearly enough. Most of the time, volatile just causes needless de-optimization without solving any real problems.
You need more complex memory barriers to ensure proper ordering of reads and writes (especially since either the compiler or the CPU itself can do instruction reordering or memory store/fetch reordering). For some types of values or some CPU architectures, you also need explicit atomic instructions just to ensure that other threads don't see half of a variable change (not a problem most of the time on x86, but it can be on other architectures).
Atomic needs C++11 doesn't it?
If you want to write purely ISO-conforming C++ with absolutely no extensions or libraries, sure. GCC and Clang support intrinsics and other extensions to support atomic values portably across different OSes and architectures. Many game libraries provide their own platform-neutral APIs for threading and atomics (and some even offer higher-level abstractions that are very useful). You can very easily use threads and atomics portably across Linux, OSX, Android, Windows+MingW, Windows+VC++, iOS, etc. using pure-C APIs like SDL2.
https://wiki.libsdl.org/APIByCategory#Threads - SDL threading/atomics support, and I'd guess that you're probably already using SDL (or something equivalent) anyway
https://www.threadingbuildingblocks.org/ - Intel Threaded Building Blocks, which is a high-level concurrency library that supports Windows and Linux