Lock state can change due to another thread, you are currently using 4 different states two in each thread and the ifs are correct. However it is really easy to miss this 2 months down the line when you are debugging this and you add a LOCKED check or UNLOCK_REQUESTED check in the different thread, that will cause you a race condition.
With acquiring an actual HW barrier and releasing this when you are done with the section that has to be done before the threads can run in parallel again, eg a data copy from one thread to another, you guarantee that they will not create a flip situation like described above.
I used this template as my data copy between threads when I wrote a physics simulation, which ran on a separate thread form the input and rendering: http://pastebin.com/Uxgh6qC8 The class in this example protects the memory access to it's internal variable with Critical sections, this is a HW barrier that is slightly easier to use than either a mutex or a semaphore on windows. Generally when doing memory operations from one thread to the other you want to make sure that you either return a copy of the protected data like in the example, or you write directly to a class variable that is protected through a HW barrier on each access to avoid stomping of data. The reason for the copy return is because if you release the CS and then return the m_data var, another thread could already have acquired the CS and is midway in writing to the variable you are now returning and thus is garbage, the copy avoids this all together.
As stated above volatile doesn't guarantee a lock on the memory all it guarantees is that the variable will be fetched from memory or written to in memory, so that you know that the CPU doesn't have a register copy of this variable with a different value. It is sometimes useful to mark non critical operations one a thread to another thread in this way, but even then you can end up with a race condition.