That applies to many architectures, but it doesn't apply to x86. In x86 any MOV from memory by itself has acquire semantics. Any MOV into memory has release semantics. You don't need a fence.
So you're correct from the standpoint of writing portable C++, but on x86 the code will work as is. Because of the above, x86 will allow code that is technically undefined C++ to work as one would expect. (Let me reiterate that: the code as written is undefined behavior)
EDIT:Strike-through due to point made by Matias Goldberg bellow.