But as it says on the top of the link you posted "Simple reads and writes to properly-aligned 32-bit variables are atomic operations." And in the link slicer4ever linked to, Microsoft extends the definition of volatile to give it acquire and release semantics. These two things together should be sufficient for multithreading, especially with only two threads involved.
The link in my article refers to all properly-aligned 32-bit variables on x86 machines, not just volatile ones, and it refers to only the actual instruction that performs the read or write. As in, these cases are not possible:
Thread 1 performs “MOV [0xBADF00D], EAX” while thread 2 performs “MOV [0xBADF00D], EDX”—it is not possible for address 0xBADF00D to have part of the value from thread 1’s EAX and part of the value from thread 2’s EDX; it will have one or the other.
Thread 1 performs “MOV [0xBADF00D], EAX” while thread 2 performs “MOV EDX, [0xBADF00D]”—the reads and writes will not happen concurrently such that it is possible for thread to load part of the previous value at 0xBADF00D and part of the EAX value that was being written to it at the same time.
The article I posted refers to this and only this type of atomic operation, and is unrelated to volatile integers.
With care you can make his code work without the Interlocked* functions via proper fencing. That in itself is a bit expensive but could be faster than Interlocked*, but the trade-off in performance is that it only works in the most controlled of cases. Introducing yet another thread into the mix may break it, or an unforeseen case in which packets are received unexpectedly quickly at just the wrong moment (not saying I spot that case here but we had a similar case at work).
Also, the guy sitting next to me right wrote this thorough article on threading, which covers basically everything you could ever want to know.Multithreading