• Create Account

# cache friendly lock free concept?

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

41 replies to this topic

### #21King Mir  Members   -  Reputation: 2050

Like
1Likes
Like

Posted 31 October 2013 - 09:40 PM

So I should point out that the volatile extension link show and example of a volatile variable being used to implement mutual exclusion, without any threading primitives.

You don't need any fencing to implement acquire-release semantics on x86. You do generally for sequential consistency, but here there are only two threads, so acquire-release is equivalent to sequential consistency.

So the OP's code, and Microsoft's example should work on x86. It shouldn't work on ARM, and Microsoft compiles for ARM, so maybe they should clarify that. (Unless they actually do add a memory fence on ARM).

In any case, nobody do this.

----

Also I disagree that "if( LockState == UNLOCKED ){ LockState = LOCKED }" needs to be atomic. It's the same logic as all the other branches, except that the body is empty.

Edited by King Mir, 31 October 2013 - 09:45 PM.

### #22slicer4ever  Crossbones+   -  Reputation: 3982

Like
0Likes
Like

Posted 01 November 2013 - 05:51 PM

Thanks for the input so far. @L. Spiro, i'm reading over the articles's you linked to. I however still don't understand the logic that Matias Goldberg put forth:

The code you posted is simply a race condition waiting to happen. Even if volatile actually generated hardware memory barriers; your code is still broken because "if( LockState == UNLOCKED ){ LockState = LOCKED }" is still not atomic.
The second thread can enter the 'if' block because the first one didn't set LockState to LOCKED yet.

I'd still like an explanation about what you mean by "The second thread can enter the 'if' block" when the second thread relies on LockState being set to LOCKED, how can it somehow enter that if branch then? could you please seriously elaborate on this.

Edited by slicer4ever, 01 November 2013 - 06:07 PM.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

### #23Paradigm Shifter  Crossbones+   -  Reputation: 5435

Like
0Likes
Like

Posted 01 November 2013 - 06:43 PM

You can get task switched to another thread after any instruction (an if and assignment can be several instructions long), so you need to do the check and the set of the new value in a single instruction.

Like this

InterlockedCompareExchange function

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683560%28v=vs.85%29.aspx

Edited by Paradigm Shifter, 01 November 2013 - 06:45 PM.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

### #24King Mir  Members   -  Reputation: 2050

Like
0Likes
Like

Posted 01 November 2013 - 07:54 PM

Using compare exchange there would only make the program slower. There's no need for the operation to be atomic here.

### #25Matias Goldberg  Crossbones+   -  Reputation: 3697

Like
-2Likes
Like

Posted 02 November 2013 - 04:30 PM

Thanks for the input so far. @L. Spiro, i'm reading over the articles's you linked to. I however still don't understand the logic that Matias Goldberg put forth:

The code you posted is simply a race condition waiting to happen. Even if volatile actually generated hardware memory barriers; your code is still broken because "if( LockState == UNLOCKED ){ LockState = LOCKED }" is still not atomic.
The second thread can enter the 'if' block because the first one didn't set LockState to LOCKED yet.

I'd still like an explanation about what you mean by "The second thread can enter the 'if' block" when the second thread relies on LockState being set to LOCKED, how can it somehow enter that if branch then? could you please seriously elaborate on this.

When you have the following code running on two threads at the same time:
if( LockState == UNLOCKED )
{
LockState = LOCKED;
doSomething();
}
The following can potentially happen:
1a) thread 1 reads LockState. Its value is UNLOCKED. Enters the if block.
1b) thread 2 reads LockState. Its value is UNLOCKED. Enters the if block.
2a) thread 1 sets LockState to LOCKED
2b) thread 2 sets LockState to LOCKED

There you go, two threads entering the if block. LockState is LOCKED.

Another case:
1a) thread 1 reads LockState. Its value is UNLOCKED. Enters the if block.
1b) thread 2 reads LockState. Its value is UNLOCKED. Doesn't enter the if block yet.
2a) thread 1 sets LockState to LOCKED
2b) thread 2 gets rescheduled by the OS
5b) thread 2 compares LockState, which was read in step 1b and was now in a register. It is still UNLOCKED despite in RAM is now set to LOCK. As a result, enters the if block.
6a) thread 1 is still inside doSomething

Again, both threads entered the if block at the same time.
I'm surprised I have to explain this because it's multithreading 101. It's in every tutorial and book out there.

Edited by Matias Goldberg, 02 November 2013 - 04:32 PM.

### #26fastcall22  Crossbones+   -  Reputation: 4456

Like
1Likes
Like

Posted 02 November 2013 - 05:26 PM

Mattias, doesn't that example apply to multiple threads running the same code? Slicer4ever's code has two threads running separate code. Hold on. Not an expert in multithreaded programming, but...
The network thread acts on LOCK_REQUESTED/UNLOCK_REQUESTED and sets to LOCKED/UNLOCKED, whereas the game thread acts on LOCKED/UNLOCKED and sets to UNLOCK_REQUESTED/LOCK_REQUESTED. A thread cannot renter its "critical sections" until the other thread unsets the LockState. Assuming that LockState is executed and committed after at the end of the threads' "critical sections", isn't the worse case scenario then a thread may need to reenter its loop? Or, am I misunderstanding something here?

Edited by fastcall22, 02 November 2013 - 05:31 PM.

c3RhdGljIGNoYXIgeW91cl9tb21bMVVMTCA8PCA2NF07CnNwcmludGYoeW91cl9tb20sICJpcyBmYXQiKTs=

### #27Matias Goldberg  Crossbones+   -  Reputation: 3697

Like
1Likes
Like

Posted 02 November 2013 - 06:04 PM

fastcall22, oohh! I read the code wrong. I apologize
You're right. The access to LockState appears to not contain race conditions (can't say about the rest of the variables because I didn't look those carefully) as long as LockState is only accessed from two threads.

Nonetheless, he should put hardware fences/Memory barriers or otherwise the out of order execution unit from an x86 cpu can cause LockState to be set to a different value sooner than expected.

Furthermore, and as already pointed out, InPackets OutPackets and TempList are likely to be in the same cache line which means a lot of false cache sharing.

Another problem with this routine is that it appears the packets can be delayed several iterations because LockState is still being updated in its 4 state system; which can add noticeable lag.

### #28slicer4ever  Crossbones+   -  Reputation: 3982

Like
0Likes
Like

Posted 02 November 2013 - 09:04 PM

fastcall22, oohh! I read the code wrong. I apologize
You're right. The access to LockState appears to not contain race conditions (can't say about the rest of the variables because I didn't look those carefully) as long as LockState is only accessed from two threads.

Nonetheless, he should put hardware fences/Memory barriers or otherwise the out of order execution unit from an x86 cpu can cause LockState to be set to a different value sooner than expected.

Furthermore, and as already pointed out, InPackets OutPackets and TempList are likely to be in the same cache line which means a lot of false cache sharing.

Another problem with this routine is that it appears the packets can be delayed several iterations because LockState is still being updated in its 4 state system; which can add noticeable lag.

so.....now that we are finally past this=-\, can we get back to the root of the original question:

Furthermore, and as already pointed out, InPackets OutPackets and TempList are likely to be in the same cache line which means a lot of false cache sharing.

This is mostly what i care about, basically i'm exchanging the pointers, so let's say the threads are running on two separate cores, so basically, if i understand what's going on at that level(which i probably don't, hence these questions) ThreadA swaps the pointers, then in order for ThreadB to have these new pointer locations in it's cache, will ThreadA send an update message to ThreadB(doubtful, as in theory ThreadB doesn't know ThreadA also has these pointers), so to get the updated values, it has to read it back from ram(meaning ThreadA would write to ram), or potentially from a higher level cache? can someone explain to me what's going on at this level, and if i should be doing something else here to ensure i don't have more false cache sharing?
Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

### #29King Mir  Members   -  Reputation: 2050

Like
1Likes
Like

Posted 03 November 2013 - 01:16 AM

This is mostly what i care about, basically i'm exchanging the pointers, so let's say the threads are running on two separate cores, so basically, if i understand what's going on at that level(which i probably don't, hence these questions) ThreadA swaps the pointers, then in order for ThreadB to have these new pointer locations in it's cache, will ThreadA send an update message to ThreadB(doubtful, as in theory ThreadB doesn't know ThreadA also has these pointers), so to get the updated values, it has to read it back from ram(meaning ThreadA would write to ram), or potentially from a higher level cache? can someone explain to me what's going on at this level, and if i should be doing something else here to ensure i don't have more false cache sharing?

No that's not what's going on.

A thread on Core A writes one of the 4 variables, which puts it in it's L1 cache. Then a thread on Core B reads one of the variables, but noticing that the cache line of that variable was modified by core A, it must get it from the write buffer of Core A, which depending on the detail of the implementation involve going though the L3 cache, or otherwise has a comparable latency. It might then write to that cache line itself, so that Core A's L1 cached version is invalidated, and it must read from the write buffer of B. Then Core A might write to the cache line, which invalidates Core B's l1 and L2 cache of that line. And back and forth.

That kind of thing is unavoidable on LockState, and to a lesser extent on TempList, but if all 4 variables are on the same cache line, they are all shared as if they were one variable, incurring the performance penalties of sharing on all of them whenever any one of them is accessed. It's like they are one variable to the cache. So the solution is to ensure that all four variables are on a separate cache line.

One way to do this is instead of using pointers and a primitive integer, use a class the size of a cache line. You can even make it your linked list, if you make swapping the list cheep, and put enough padding at the end of it to fill a cache line.

Edited by King Mir, 03 November 2013 - 01:18 AM.

### #30slicer4ever  Crossbones+   -  Reputation: 3982

Like
0Likes
Like

Posted 03 November 2013 - 11:24 AM

This is mostly what i care about, basically i'm exchanging the pointers, so let's say the threads are running on two separate cores, so basically, if i understand what's going on at that level(which i probably don't, hence these questions) ThreadA swaps the pointers, then in order for ThreadB to have these new pointer locations in it's cache, will ThreadA send an update message to ThreadB(doubtful, as in theory ThreadB doesn't know ThreadA also has these pointers), so to get the updated values, it has to read it back from ram(meaning ThreadA would write to ram), or potentially from a higher level cache? can someone explain to me what's going on at this level, and if i should be doing something else here to ensure i don't have more false cache sharing?

No that's not what's going on.

A thread on Core A writes one of the 4 variables, which puts it in it's L1 cache. Then a thread on Core B reads one of the variables, but noticing that the cache line of that variable was modified by core A, it must get it from the write buffer of Core A, which depending on the detail of the implementation involve going though the L3 cache, or otherwise has a comparable latency. It might then write to that cache line itself, so that Core A's L1 cached version is invalidated, and it must read from the write buffer of B. Then Core A might write to the cache line, which invalidates Core B's l1 and L2 cache of that line. And back and forth.

That kind of thing is unavoidable on LockState, and to a lesser extent on TempList, but if all 4 variables are on the same cache line, they are all shared as if they were one variable, incurring the performance penalties of sharing on all of them whenever any one of them is accessed. It's like they are one variable to the cache. So the solution is to ensure that all four variables are on a separate cache line.

One way to do this is instead of using pointers and a primitive integer, use a class the size of a cache line. You can even make it your linked list, if you make swapping the list cheep, and put enough padding at the end of it to fill a cache line.

Thank you for such a detailed explanation of everything that's going on! =-)
Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

### #31Matias Goldberg  Crossbones+   -  Reputation: 3697

Like
1Likes
Like

Posted 03 November 2013 - 04:27 PM

Yes. King Mir's explanation on false cache sharing is spot on. If you modify a single byte inside a cache line, it "dirties" the whole line; and modifying the same region of memory from all cores at very high frequencies adds a tremendous overhead.

It may be worth saying that most x86 archs have a cache line size of 64 bytes; though this isn't guaranteed (i.e. it could be 128 bytes on some systems, 32 bytes on others, etc); while in ARM it is more typical to find cache sizes of either 32 or 64 bytes. What I mean, padding of just 4 or 8 bytes won't be enough.

### #32HScottH  Members   -  Reputation: 512

Like
1Likes
Like

Posted 03 November 2013 - 04:39 PM

Your design is logically good, but technically mistaken.  Lock-free structures generally depend on compare/exchange instructions to keep threads from reading the wrong value or overwriting someone else's.  "Lock-free" is a popular term these days, and while it is an awesome choice in the right circumstances, it is not always the best.  For e.g.:

* Lock-free design works best when the period of contention between threads is very small, such as when swapping a pointer value or updating a counter.  This is because there is always a brief spin whenever the protected value is found to in contention.

* Lock-based design works best when the period of contention will be longer, such as to execute some IO or do some processing, as the waiting thread is taken out of the scheduler and no core "spins" during this time.

Nonetheless, your use of Volatile will not do what you want. It is intended to force the compiler to do NO optimizations on that variable by declaring that the value might be updated at any time--such as by a separate piece of hardware--but the compiler does not place a memory fence around accesses to the variable. So, yes, your caches may be inconsistent.

In general, designing highly concurrent logic is hard. It's fun to do, and if that is your goal, please proceed!  I hold a patent in this area (owned by my previous employer) because I love it.  But most often, if your goal is just efficiency, classic producer/consumer logic (with or without lockinig) will meet any possible need you have related to networking--which lets face it, is incredibly slow compared to our CPUs these days :-)

### #33Hodgman  Moderators   -  Reputation: 31818

Like
2Likes
Like

Posted 03 November 2013 - 05:51 PM

Everyone's already covered what's right and wrong (don't use volatile! It may work on MSVC due to extensions, but it's actually wrong) with the code, but I just have to point out that this code is not 'lock free'.

Lock free doesn't mean "doesn't use the existing lock structures"; it's a very specific term that basically means that if any of the threads happens to stall at any point in time, this will not impact the progress of any other thread.
The code in this thread is the opposite of that -- it hands control of the state variable back and forth between two threads exclusively. If you freeze the network thread while the state is not "unlocked", then the game thread also freezes. [edit: no it doesn't!]

The algorithm relies on the fact that only one thread "owns" the state variable at a time -- as mentioned by others, if his wasnt the case, you'd need some CAS/interlocked logic in there somewhere.

P.s. not being"lock free" isn't a bad thing. I'm just being a definition-nazi ;)

Edited by Hodgman, 04 November 2013 - 02:53 AM.

### #34King Mir  Members   -  Reputation: 2050

Like
1Likes
Like

Posted 03 November 2013 - 06:40 PM

Looks lock free to me. Freezing the network thread like you say would stop updating TempList, but the game thread would still continue to call UpdateGames().

I do think that blocking on LOCK_REQUESTED would be prudent though, especially if RetrievePendingPackets() is not lock free.

### #35L. Spiro  Crossbones+   -  Reputation: 14258

Like
1Likes
Like

Posted 03 November 2013 - 07:29 PM

King Mir is correct on that point.
If the network thread freezes the game thread will still call UpdateGames() and loop again; this part is not related to the state of LockState (we are of course assuming GameThread() is in some kind of loop or else none of this would be a discussion since it would be called once and that would be the end of it).

I’ve stopped replying in the interest of not debating with someone who simply refuses to understand what I am trying to explain, and I hoped that if others tried to explain the same thing he may come around.  But since I am here-

So I should point out that the volatile extension link show and example of a volatile variable being used to implement mutual exclusion, without any threading primitives.

That example is unrelated to his—it shows the variable being controlled by 1 thread while being accessed by another.  The 2nd thread never changes the value of “Sentinel”, it simply reads from it.  This example would work on any platform and any compiler, as it conforms entirely to the standard.  It has nothing to do with extensions or Microsoft®.  Also, please do not use misleading terms such as “mutual exclusion”—that is a far cry from what is happening in that sample code.  All it’s saying is that the 2nd thread will not load and keep in a register (as an optimization) the value of Sentinel in the while loop and thus will properly terminate whenever the 1st thread changes the value.

Let me put it another way.
An “if” statement requires a load, a compare, and a jump, in assembly.  Without a fence, a value could be loaded to a register, changed on another thread, but then have the CMP done on the previous value (loaded to a register) and perform its JMP based on that.  In fact that could happen even with a fence, especially if poorly placed.

If you assume Microsoft® extensions are aiming to solve this problem (which they are not), where exactly do you propose the compiler knows to place the fence.  On both threads.

It doesn’t.  Using fences correctly requires a human author, and even then it fails frequently unless the author is really really good at what he or she does.  There is no way the compiler could ever know where to place the fences on both threads, and there is no way the compiler would even try—it takes 100,000 compiler accuracies to make up for a single compiler bug.

If you need proof that Matias Goldberg’s first post is accurate, consider this line from that link:

A write to a volatile object (volatile write) has Release semantics; a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.

As he correctly stated, the compiled binary is the only thing related to the Microsoft® specifics.
My job is to optimize code on all platforms, including consoles, and I spend most of my day staring at compiled assembly. Not only that but I wrote MHS, including its assembler and disassembler, and have spent a lot of time looking at code from other games and applications including my own.

The compiler is absolutely 100% free to change the orders of reads and writes as long as the final result is the same, and I can promise you this happens on every single compiler I have encountered, for every platform.  In fact Nintendo 3DS’s compiler seems to go out of its way to do this, I assume for cache coherency.

This is the relationship between Microsoft®’s extensions and volatile.  It guarantees that in the compiled binary the code will be executed in the order specified by your code.

It does absolutely nothing to prevent the above situation between MOV, CMP, and JMP.  It adds no fences at all.  If you do not believe me, make your own program and check its disassembly.

Get back to us when you have done that.

L. Spiro

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

### #36Hodgman  Moderators   -  Reputation: 31818

Like
0Likes
Like

Posted 03 November 2013 - 07:35 PM

I thought that UpdateGames() was inside if(LockState==UNLOCKED) -- this would mean that the main loop would stop updating the game if the network thread paused while the state was anything except UNLOCKED.

But I misread, so RetrievePendingPackets() and UpdateGames() with both continually run, regardless of the activity of the other thread. Freezing one thread won't impede the other.

So, yeah, as long as you say that the data being communicated is "a list of packets", then the communication between these two threads is lock-free.

If you wanted to be ridiculously picky, you could argue that the threads are actually communicating "individual packets", in which case there is locking, as you might have packets buffered up in InPackets/OutPackets that are stuck there due to the LockState variable being held by a thread that's gone to sleep (e.g. if the main thread is pre-empted right before LockState=UNLOCK_REQUESTED, then even though some data has been "shared" at this point, the network thread can't read it until the game thread wakes up  That's something for academia to argue about though

### #37King Mir  Members   -  Reputation: 2050

Like
-2Likes
Like

Posted 03 November 2013 - 07:58 PM

@L. Spiro
That applies to many architectures, but it doesn't apply to x86. In x86 any MOV from memory by itself has acquire semantics. Any MOV into memory has release semantics. You don't need a fence. You only need a fence to implement sequential consistency, usually placed after each store, but with two threads, that's not needed, because the difference between acquire-release semantics and sequential consistency is only apparent with more threads.

So you're correct from the standpoint of writing portable C++, but on x86 the code will work as is. Because of the above, x86 will allow code that is technically undefined C++ to work as one would expect. (Let me reiterate that: the code as written is undefined behavior)

EDIT:Strike-through due to point made by Matias Goldberg bellow.

Edited by King Mir, 04 November 2013 - 12:17 PM.

### #38Hodgman  Moderators   -  Reputation: 31818

Like
1Likes
Like

Posted 03 November 2013 - 08:01 PM

The compiler is absolutely 100% free to change the orders of reads and writes as long as the final result is the same, and I can promise you this happens on every single compiler I have encountered, for every platform.

This is the relationship between Microsoft®’s extensions and volatile.  It guarantees that in the compiled binary the code will be executed in the order specified by your code.  [<-- and that's enough]

It does absolutely nothing to prevent the above situation between MOV, CMP, and JMP.  It adds no fences at all.  If you do not believe me, make your own program and check its disassembly.

x86 CPU's don't reorder memory ops though, and they don't have fence instructions.

... Actually "Loads may be reordered with older stores to different locations" and there are rarely used LFENCE/SFENCE/MFENCE instructions ...[/edit]

Any instruction with the lock prefix acts like a fence, but that prefix is only required when you need to perform an atomic operation on memory that could potentially be modified by two threads simultaneously.

Slicer's code works on mutual-exclusion. Assuming that there's no reordering, then there's no variables that are going to be written to by two threads at once -- every variable always has only 1 writer. So, it will work on x86 with just compile-time fences.

It won't be portable to other CPU's unless you add fences, so he really should be using a std::atomic still

Yeah, if his code was more complex and the state variable could have two potential writers, he would need to use CAS instead of an if and assignment.

The standard C/C++ volatile is not a proper compile-time fence, it only acts as a fence with regards to other volatiles -- reads/writes to volatiles aren't reordered past each other, but reads/writes to non-volatiles can be reordered past volatile reads/writes. The MSVC extension goes further than this turns turns volatile reads/writes into full compile-time fences.

A standard C++ compiler would be able to remove the =1 and =2 lines from this code, but MSVC keeps them, because there's a volatile write between them all.

	volatile int g_state_v = 0;
extern int g_state;
...
g_state = 1;
g_state_v++;
g_state = 2;
g_state_v++;
g_state = 3;
printf( "%d %d", g_state, g_state_v );

Edited by Hodgman, 04 November 2013 - 02:51 AM.

### #39Matias Goldberg  Crossbones+   -  Reputation: 3697

Like
3Likes
Like

Posted 03 November 2013 - 11:49 PM

@L. Spiro
That applies to many architectures, but it doesn't apply to x86. In x86 any MOV from memory by itself has acquire semantics. Any MOV into memory has release semantics. You don't need a fence. You only need a fence to implement sequential consistency, usually placed after each store, but with two threads, that's not needed, because the difference between acquire-release semantics and sequential consistency is only apparent with more threads.

So you're correct from the standpoint of writing portable C++, but on x86 the code will work as is. Because of the above, x86 will allow code that is technically undefined C++ to work as one would expect. (Let me reiterate that: the code as written is undefined behavior)

That's an overly simplification.
Section 8.2.2 from the Intel Intel 64 and IA-32 Architectures Software Developer’s Manual is very clear that:
"Reads may be reordered with older writes to different locations but not with older writes to the same location."

Here's even an example of how such reordering can affect the logic of an application even when just running 2 threads. Whether this issue affects the OP's code, can only be said by inspecting the assembly.

### #40King Mir  Members   -  Reputation: 2050

Like
2Likes
Like

Posted 04 November 2013 - 12:14 PM

@L. Spiro
That applies to many architectures, but it doesn't apply to x86. In x86 any MOV from memory by itself has acquire semantics. Any MOV into memory has release semantics. You don't need a fence. You only need a fence to implement sequential consistency, usually placed after each store, but with two threads, that's not needed, because the difference between acquire-release semantics and sequential consistency is only apparent with more threads.

So you're correct from the standpoint of writing portable C++, but on x86 the code will work as is. Because of the above, x86 will allow code that is technically undefined C++ to work as one would expect. (Let me reiterate that: the code as written is undefined behavior)

That's an overly simplification.
Section 8.2.2 from the Intel Intel 64 and IA-32 Architectures Software Developer’s Manual is very clear that:
"Reads may be reordered with older writes to different locations but not with older writes to the same location."
Here's even an example of how such reordering can affect the logic of an application even when just running 2 threads. Whether this issue affects the OP's code, can only be said by inspecting the assembly.

No, the code you linked to can be analyzed in a way that doesn't require looking at the disassebly to spot the problem, and so can the OP. In the op it's trivial to see that each access to the lock is an acquire-read on one variable followed by some critical section code, followed by a release-write on the same variable. That pattern is tried and true.

zeroWants = true; //release on zeroWants
victim = 0; //release on victim
while (oneWants && victim == 0) // aquire on  oneWants and victim
continue;
// critical code

It's a release followed by acquire, which is nothing. The acquire by itself would be usable to create a lock, but clearly here this lock does not work if you remove the first two lines.

But conceded, that does show that acquire-release semantics can differ from sequential consistency even with two threads. Thank you for the example.

Edited by King Mir, 04 November 2013 - 12:14 PM.

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

PARTNERS