Back to General and Gameplay Programming

cache friendly lock free concept?

General and Gameplay Programming Programming

Started by slicer4ever October 31, 2013 03:57 PM

40 comments, last by King Mir 10 years, 6 months ago

DoctorGlow

1,813

October 31, 2013 08:22 PM

I mark lockstate as violatile, which ensures it is usable for memory locking in this example.

This is wrong assumption. From many links about, I picked one, http://en.wikipedia.org/wiki/Volatile_variable

Quote:

Operations on volatile variables are not atomic, nor do they establish a proper happens-before relationship for threading. This is according to the relevant standards (C, C++, POSIX, WIN32),[2] and this is the matter of fact for the vast majority of current implementations. Thus, the usage of volatile keyword as a portable synchronization mechanism is discouraged by many C/C++ groups.[3][4][5]

slicer4ever

6,773

Author

October 31, 2013 08:26 PM

I mark lockstate as violatile, which ensures it is usable for memory locking in this example.

This is wrong assumption. From many links about, I picked one, http://en.wikipedia.org/wiki/Volatile_variable

Quote:

Operations on volatile variables are not atomic, nor do they establish a proper happens-before relationship for threading. This is according to the relevant standards (C, C++, POSIX, WIN32),[2] and this is the matter of fact for the vast majority of current implementations. Thus, the usage of volatile keyword as a portable synchronization mechanism is discouraged by many C/C++ groups.[3][4][5]

http://msdn.microsoft.com/en-us/library/12a04hfd%28v=vs.90%29.aspx

his allows volatile objects to be used for memory locks and releases in multithreaded applications.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

King Mir

2,506

October 31, 2013 08:29 PM

I mark lockstate as violatile, which ensures it is usable for memory locking in this example.

@King Mir, I was thinking that the swap might be suiceptable to the pointers sitting on their own cache lines, when i do the swap, and update the lock, the other thread will likely not have the swapped pointers sitting in it's cache. hmm, this is a potentially serious problem.

It needs to be atomic. You can use aquire-release semantics instead of sequential consistency, but volitile is insufficient.

Quite the opposite, you want the data on seperate cache lines. All 4 variables need to be on their own cache line. Otherwise you could have false sharing and cache thrashing. It's not a correctness problem, but it is a performance problem.

King Mir

2,506

October 31, 2013 08:35 PM

I mark lockstate as violatile, which ensures it is usable for memory locking in this example.
This is wrong assumption. From many links about, I picked one, http://en.wikipedia.org/wiki/Volatile_variable

Quote:
Operations on volatile variables are not atomic, nor do they establish a proper happens-before relationship for threading. This is according to the relevant standards (C, C++, POSIX, WIN32),%5B2%5D and this is the matter of fact for the vast majority of current implementations. Thus, the usage of volatile keyword as a portable synchronization mechanism is discouraged by many C/C++ groups.%5B3%5D %5B4%5D %5B5%5D
http://msdn.microsoft.com/en-us/library/12a04hfd(v=vs.90).aspx

his allows volatile objects to be used for memory locks and releases in multithreaded applications.

Why are you using Microsoft specific compiler extensions when a standard solution is readily available?

Matias Goldberg

9,637

October 31, 2013 10:11 PM

I mark lockstate as violatile, which ensures it is usable for memory locking in this example.
This is wrong assumption. From many links about, I picked one, http://en.wikipedia.org/wiki/Volatile_variable

Quote:
Operations on volatile variables are not atomic, nor do they establish a proper happens-before relationship for threading. This is according to the relevant standards (C, C++, POSIX, WIN32),%5B2%5D and this is the matter of fact for the vast majority of current implementations. Thus, the usage of volatile keyword as a portable synchronization mechanism is discouraged by many C/C++ groups.%5B3%5D %5B4%5D %5B5%5D
http://msdn.microsoft.com/en-us/library/12a04hfd(v=vs.90).aspx

Haha, that's cute. No.

The Release and Acquire semantics of the volatile commented in that MSDN article refer to compiler's order of operations in which the assembly gets generated; they're compiler memory barriers. But it does not guarantee that they generate the proper hardware memory barriers.
A quick look to the generated assembly makes it evident. The reason it's "Microsoft Specific" is because the standard doesn't guarantee that. For non-volatile variables, the optimizing compiler is allowed to put reads/writes before or after the volatile read/write happens; which is still something important when dealing with concurrent code; but in all cases volatile does nothing in terms of HW memory barriers nor guarantee atomicity.

The code you posted is simply a race condition waiting to happen. Even if volatile actually generated hardware memory barriers; your code is still broken because "if( LockState == UNLOCKED ){ LockState = LOCKED }" is still not atomic.
The second thread can enter the 'if' block because the first one didn't set LockState to LOCKED yet.
You need a hardware-locked CAS to do that, or a mutex.

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

L. Spiro

25,818

October 31, 2013 10:14 PM

his allows volatile objects to be used for memory locks and releases in multithreaded applications.

It still needs to be atomically accessed via the Interlocked* family of functions (http://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx).
Being volatile means the compiler can’t perform any optimizations on it due to the assumption that it will be accessed across multiple threads, preventing the compiler from making assumptions that could allow it to otherwise add said optimizations.
Being volatile is necessary for multi-threading but it is only one part. Notice how those functions require volatile parameters.

Most likely the only reason you have never noticed a race condition is because incoming packets are fairly rare compared to game loops, and when they do happen they would have to do so at exactly the right time such that both threads are accessing that variable.
And because of how rare incoming packets are, your concerns on cache-friendliness are misplaced.

As long as you fix the race condition you shouldn’t worry too much more about this area.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

slicer4ever

6,773

Author

October 31, 2013 10:52 PM

The code you posted is simply a race condition waiting to happen. Even if volatile actually generated hardware memory barriers; your code is still broken because "if( LockState == UNLOCKED ){ LockState = LOCKED }" is still not atomic.
The second thread can enter the 'if' block because the first one didn't set LockState to LOCKED yet.
You need a hardware-locked CAS to do that, or a mutex.

elaborate on this please, what "if" block is being entered in the second thread, if LockState has not changed it's state yet?

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

NightCreature83

5,061

October 31, 2013 11:19 PM

Lock state can change due to another thread, you are currently using 4 different states two in each thread and the ifs are correct. However it is really easy to miss this 2 months down the line when you are debugging this and you add a LOCKED check or UNLOCK_REQUESTED check in the different thread, that will cause you a race condition.

With acquiring an actual HW barrier and releasing this when you are done with the section that has to be done before the threads can run in parallel again, eg a data copy from one thread to another, you guarantee that they will not create a flip situation like described above.

I used this template as my data copy between threads when I wrote a physics simulation, which ran on a separate thread form the input and rendering: http://pastebin.com/Uxgh6qC8 The class in this example protects the memory access to it's internal variable with Critical sections, this is a HW barrier that is slightly easier to use than either a mutex or a semaphore on windows. Generally when doing memory operations from one thread to the other you want to make sure that you either return a copy of the protected data like in the example, or you write directly to a class variable that is protected through a HW barrier on each access to avoid stomping of data. The reason for the copy return is because if you release the CS and then return the m_data var, another thread could already have acquired the CS and is midway in writing to the variable you are now returning and thus is garbage, the copy avoids this all together.

As stated above volatile doesn't guarantee a lock on the memory all it guarantees is that the variable will be fetched from memory or written to in memory, so that you know that the CPU doesn't have a register copy of this variable with a different value. It is sometimes useful to mark non critical operations one a thread to another thread in this way, but even then you can end up with a race condition.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

King Mir

2,506

November 01, 2013 12:43 AM

his allows volatile objects to be used for memory locks and releases in multithreaded applications.
It still needs to be atomically accessed via the Interlocked* family of functions (http://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx).
Being volatile means the compiler can’t perform any optimizations on it due to the assumption that it will be accessed across multiple threads, preventing the compiler from making assumptions that could allow it to otherwise add said optimizations.
Being volatile is necessary for multi-threading but it is only one part. Notice how those functions require volatile parameters.

But as it says on the top of the link you posted "Simple reads and writes to properly-aligned 32-bit variables are atomic operations." And in the link slicer4ever linked to, Microsoft extends the definition of volatile to give it acquire and release semantics. These two things together should be sufficient for multithreading, especially with only two threads involved.

Of course, I don't recommend doing this. Just use C++11 std::atomic<int>.

-----

I also agree with NightCreature83 that the code will only work with the two threads. However, I think EnterCriticalSection is blocking, so that would make this no longer lock free. But actually you might want to do that anyway, at least on the network side.

L. Spiro

25,818

November 01, 2013 01:33 AM

But as it says on the top of the link you posted "Simple reads and writes to properly-aligned 32-bit variables are atomic operations." And in the link slicer4ever linked to, Microsoft extends the definition of volatile to give it acquire and release semantics. These two things together should be sufficient for multithreading, especially with only two threads involved.

See http://www.gamedev.net/topic/649586-cache-friendly-lock-free-concept/#entry5106076

The link in my article refers to all properly-aligned 32-bit variables on x86 machines, not just volatile ones, and it refers to only the actual instruction that performs the read or write. As in, these cases are not possible:
*
Thread 1 performs “MOV [0xBADF00D], EAX” while thread 2 performs “MOV [0xBADF00D], EDX”—it is not possible for address 0xBADF00D to have part of the value from thread 1’s EAX and part of the value from thread 2’s EDX; it will have one or the other.

*
Thread 1 performs “MOV [0xBADF00D], EAX” while thread 2 performs “MOV EDX, [0xBADF00D]”—the reads and writes will not happen concurrently such that it is possible for thread to load part of the previous value at 0xBADF00D and part of the EAX value that was being written to it at the same time.

The article I posted refers to this and only this type of atomic operation, and is unrelated to volatile integers.
With care you can make his code work without the Interlocked* functions via proper fencing. That in itself is a bit expensive but could be faster than Interlocked*, but the trade-off in performance is that it only works in the most controlled of cases. Introducing yet another thread into the mix may break it, or an unforeseen case in which packets are received unexpectedly quickly at just the wrong moment (not saying I spot that case here but we had a similar case at work).

Also, the guy sitting next to me right wrote this thorough article on threading, which covers basically everything you could ever want to know.

Multithreading

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

cache friendly lock free concept?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

cache friendly lock free concept?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines