• FEATURED

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

17 replies to this topic

### #1Shaarigan  Members

Posted 20 March 2017 - 10:48 AM

Hey

So now I have the atomic queue that works well for now but I want to implement some kind of job stealing to it where I'm not quite sure how to access the queue from another thread. Using dequeue works as expected and investigating my 12 worker (have hexa core CPU) at least one has the test task as element of its queue but N of them may have had it already so this seems ok to me.

My general question is how is job stealing implemented on the queue side; should exist an other function in the queue that lets another thread steel a task or is a simple dequeue right ok for it? Or is the real work done at the managing side that handles the worker sharing to each other and of course the art of the state?

The background is that I saw some implementation where the queue was more kind of a stac for the owning thread so push and pop was handled at the owner side and dequeue on the thieves side. I saw a few problems in that implementation where running continious tasks like the messaging system would always be on top of the list and always run on certain worker when there would be other continious tasks could stuck the whole job system by continious tasks running on each worker and so never ever getting other tasks to run even if they could be stolen. So I decided to go for a traditional queue that would garantuee that each task regardless if continious or not would get its runtime.

And yes, I'm not using std::atomic nor anything else from the C++11 toolbox rather than a platform wrapper to some interloked/__sync_XX functions so dont advise any C++11 related gimmicks here please

Edited by Shaarigan, 21 March 2017 - 08:35 AM.

### #2samoth  Members

Posted 20 March 2017 - 11:35 AM

Ugh. Did someone say "Don't write your own lockfree stuff"? You know, it often doesn't appear to be helpful when people toss out statements like that, but... seriously. Don't write your own lockfree stuff, unless you know what you're doing.

A few points to think about:

1. What happens if the queue is full? This is something that can very conceivably happen with a bounded queue such as yours. You're not checking for that condition, but even if you wanted to do it, how would an unconditional exchange_add even allow for it? It will have incremented the index whether you are happy about it or not!
2. What is the rationale behind compare-exchanging the read and write pointers (which is a super-expensive operation)?
3. How do you know that if(bottom - top > 0) is correct or meaningful at all? Two presumably atomic, but distinct operations, one for reading the write pointer, and one for reading the read pointer are not atomic. Which means that whatever values you have, in a multithreaded environment with threads adjusting one or the other, they are almost guaranteed to be wrong at times.
4. Memory ordering, happens-before guarantees, anyone?

### #3Hodgman  Moderators

Posted 20 March 2017 - 05:23 PM

POPULAR

How do you ensure thread-safety here? You increment the write cursor and then write the data to that location afterwards... But as soon as the write cursor has bee incremented, other threads will assume that it's safe to read this item, potentially before it has been written.

uint32 top = __compare_exchange(&read_ptr, 0, 0), bottom = __compare_exchange(&write_ptr, 0, 0);

What's the logic of this step?
bottom = write_ptr; if(write_ptr== 0) write_ptr = 0;

I'm not quite sure how to access the queue from another thread

There's many kinds of "lock free queues". You need a SPMC or MPMC variant if you want every thread to be able to safely pull work out of any queue.
I would not recommend writing one, as it's really easy to get these kind of low-level thread-safe containers slightly wrong in a way where they work fine for 6 months before experiencing a race condition
However, I don't follow my own advice and mine is here if you want to inspect my logic: fifo_mpmc.h fifo_mpmc.hpp It's probably sub-optimal, but should hopefully be safe as it's shipped on a few game DVDs
Searching for "multi-producer multi-consumer lock-free queue" will bring up some other implementations.

One simple way to use a job system is to only use a single, central, shared MPMC queue for all jobs, which all threads push to and all threads pop from. I would use that until you notice that thread contention is an issue.
Another way is to use a SPMC queue per thread; every thread pushes to its own queue and typically pops from its own queue, but can also pop from any other queue if its own is empty.
Or use a MPMC queue per thread, and in addition to the above, threads can push jobs into other threads queues too, if required.

### #4Shaarigan  Members

Posted 21 March 2017 - 01:19 AM

Ugh. Did someone say "Don't write your own lockfree stuff"?

Sorry but in each post I read here there is sooner or later anyone telling "why not using C++11's this that whatever instead of making your own one" so ignore it

As written in the entire post, I did some research and used to read some blogposts, for example https://blog.molecular-matters.com/2015/08/24/job-system-2-0-lock-free-work-stealing-part-1-basics/ where the author itself implemented a combined LIFO/FIFO struct that as I wrote entirely maybe come to conflict with continious tasks like the message system. I liked the idea of adjusting one thread to each CPU core (if the OS does so but thats not point of topic) but thought that a dequeue operation should happen instead of a pop one

Edited by Shaarigan, 21 March 2017 - 01:22 AM.

### #5richardurich  Members

Posted 21 March 2017 - 02:46 AM

The steal would do the same as dequeue, just from the other side (if you dequeue from top, you steal from bottom). And it would require two tasks be present to permit one to be stolen unless you want to solve a three-way race in a lockless system (I'll explain below).

I am all for reinventing the wheel, so understand I strongly disagree with people saying you shouldn't try to implement a lockless queue. It's good experience, much like rebuilding a motor as a mechanic is entirely unnecessary, but really good experience to have. But I wonder if you have actually thought this through. It feels like you're trying to use a CPU scheduler concept for your design of a task scheduler, and I doubt this specific approach works how you want. If you cannot guarantee the time from the queue size check to the adjustment of the queue pointer is less than the time the thread you're stealing from takes to obtain and complete a queue, this is not a safe implementation. And you cannot guarantee that on a system where you don't control the CPU scheduler. The OS can bench your stealing thread for as long as it wants right in the middle of that work.

If this approach did work, you would also have to ensure things work properly if you try to steal at the same time the dispatcher dispatches to the same thread. (It is easily solved, but a solution must be chosen) That didn't seem to be mentioned in the article you linked to (I only skimmed it, so apologies if it was addressed), and you might never even notice the need exists during your testing, because you probably wouldn't run a bunch of other programs in the background while running a game that used your game engine. But users will! (most programmers massively undertest their multithreading code for race conditions, and often don't even know how to test to encourage race conditions get encountered more often)

If I am correct that you hadn't considered this, and don't have a solution for it, my personal recommendation is that you should focus on gaining more experience with traditional multithreading first. Try using locks, but minimizing how many you need, and how long you hold them. Learn to use signals to supplement locks, and massively improve what is possible. Learn how to test intentionally faulty code to make it more likely to hit the race condition. Basically, learn the ins and outs of any topic before you try a revolutionary idea in that topic area. And if you get really comfortable with locks, you will always be highly sought after by pretty much any company you would even consider for employment. Especially if you start talking about how to design tests for locks. But don't give up pushing the boundaries, and reinventing wheels. Just make sure you understand how a particular wheel works before you try to reinvent it.

### #6Shaarigan  Members

Posted 21 March 2017 - 04:52 AM

Just something I found whats worth to mention https://www.gamedev.net/topic/580310-c-lockless-queue-for-a-threadpool/

### #7samoth  Members

Posted 21 March 2017 - 07:06 AM

POPULAR

Sorry but in each post I read here there is sooner or later...

Yes, and I dislike doing it myself. Yet, there is a good reason for that.

Did you read the "couple of points to think about" part in my post?

did some research

Fair enough, but you have several obvious and very serious errors in the few lines of code you posted. I'm not talking about not-so-obvious ones, just the ones that are immediately visible.

You wrote that your code "works fine", but that is yet another serious mistake. Lockfree code must be either provably correct, or it must at least be validated very, very carefully with a specialized tool such as Relacy. It is a most devastating mistake to run some tests on your development machine and if it doesn't crash step forward and say "yay, it works". You know, truth is, it does not work. But what's worst, maybe it does work sometimes, so you do not even realize that it's broken, nor can you foresee when it will fail. Absence of failure is not proof of correctness.

If nothing else, consider my above point with the two atomic reads (which are really compare-exchanges, unnecessarily). Your code translates to:

1. do atomic operation 1
2. do atomic operation 2
3. assume(atomic operation 1 + atomic operation 2 == consistent state)

This cannot work. Never, not ever. Except... by pure chance (because there happens to be no concurrent modification). It is formally wrong.

Two distinct atomic operations are individually atomic, of course, but they are not atomic in their entity. It is perfectly possible (and "possible" means it not only could happen in principle, but it will eventually happen, given enough time) that right after you read the first value, another thread modifies the second value before you get a chance to read it. Or, something different. It might update both values to a different, consistent state, but you have already read half of it, so your state is inconsistent anyway. Or something. Whatever it is, it can, and will eventually fail, and when it happens you will not know what went wrong.

(To avoid this problem, do a single atomic fetch with each index occupying only half of the atomically-fetched word, then sort them out by shifting and masking. This is correct, i.e. it actually works.)

Really, do not, do not write your own lockfree stuff unless you are 100% confident that you understand exactly every single detail and every implication. Even then, even if you are 100% certain that you know your way around, you may still do it wrong (Herb Sutter did, after all).

Edited by samoth, 21 March 2017 - 07:08 AM.

### #8Shaarigan  Members

Posted 21 March 2017 - 07:46 AM

To avoid this problem, do a single atomic fetch with each index occupying only half of the atomically-fetched word, then sort them out by shifting and masking

This is an interesting and usefull approach I maybe test later

EDIT: It seems I need some coffee or hollidays

Edited by Shaarigan, 21 March 2017 - 07:46 AM.

### #9richardurich  Members

Posted 21 March 2017 - 08:58 AM

POPULAR

The algorithm Shaarigan is discussing does not assume atomic operations 1 and 2 were done in a consistent state, at least not in the article linked. (I have no idea if Shaarigan's code does, as the scheduler side was not included) This algorithm is not some amazing new idea, but a rather common lockless solution. (The stealing part is less common, and I honestly haven't seen it before, seriously doubt it's usefulness in any realistic situation, but I suppose it might work)

On these two atomic reads. Let's assume the queue length is wrong as a result of an extra element being added in between the two reads. Who cares? So you think the queue is shorter than it is. Either you bail thinking it's empty, or you take an element from the side you own and the changes to the other side have zero impact. This is the whole flipping reason we use this approach in lockless designs!

And it's perfectly safe because you're the only one who can modify your side. The scheduler is allowed to read it to determine if adding another element is allowed (why the queue can't overflow). And it doesn't matter if the scheduler gets stale data, because the modification only goes in one direction, so the scheduler can never think it's safe to add an element, just think it's not safe when it actually is, and we don't care if that happens.

I am far more sceptical of the "stealing" modification to a well-known, common lockless approach, but Shaarigan seems to recognize something fishy is going on with that stealing concept too, and questions are being asked. That is a good thing! If anyone proposed how to code the steal, perhaps Shaarigan would have said "hey, that's not safe! Here's why..." although it is impossible to provide a safe implementation without the scheduler side of the code.

Is Shaarigan in over their head? Probably. Most people are when asking questions on a public forum. But challenging ourselves is how we learn. I hope we all push ourselves beyond our comfort zones. While I have pretty good experience with locks and lockless alternatives, I have asked questions relating to Unity and C# in the past six months that most developers would find proved I had no idea what I was doing. It's because they were right. I did have no idea what I was doing.

### #10samoth  Members

Posted 21 March 2017 - 10:53 AM

Who cares? So you think the queue is shorter than it is. Either you bail thinking it's empty, or you take an element from the side you own and the changes to the other side have zero impact. This is the whole flipping reason we use this approach in lockless designs!

Maybe, or maybe not. Thinking that the queue is empty when it is not is sub-optimal, but it is no desaster. Now what about thinking that there is still room for pushing one more element when there isn't? Mind you, this is a bounded queue, so you'll be overwriting the oldest element. Sure... the queue filling up ideally shouldn't happen, but it can happen and it is something an implementation must be able to deal with, in a robust manner.

And it's perfectly safe because you're the only one who can modify your side.

Yes, maybe. Or maybe it's perfectly safe unless there is exactly one element in the queue (suddenly "your" side is "theirs" as well, both indices point at the same object) and at that precise moment a worker which is not the main thread (or two of them) has nothing to do. Surely it's perfectly safe to take that object from my side, because nobody else can take it. Surely it's no problem that the two hungry workers neither agree with each other nor with me whether there's a job left in the queue.

Thing is, any kind of lockfree queue, but work stealing as described on the Molecular Musings blog in particular is highly complex (that's an understatement!), and any such thing as "correct" or "perfectly safe" shouldn't ever be said lightly (we know way too few details, having seen only a small fragment of code). Either way, what I'm trying to say is that the advantage of such a complex system may very well be debated, and the practical gains may quite possibly be marginal, if existent. The risk of getting a total fuckup with unexplainable (and unreproducable!) failures three weeks after shipping is, however, very real.

I am not doubting that Stefan Reinalter got his implementation to work in ME (his engine has been shown to work, so apparently the job system somehow works, too!), but I'm ready to doubt that the gains warrant the complexity. Possibly (I have no data to back this up, but still, possibly) there are not even any gains compared to a not-stealing lockfree solution, or even a simple queue with a fast userspace mutex. Those don't perform as abysmally as urban myth tells us, you know. They're actually coming quite close while being totally trivial to do.

There are a lot of assumptions that may or may not be true, such as for example using one queue gives a lot of congestion (if that is really a problem, then your jobs are too trivial), or padding a 20-byte sized job object to fill an entire cache line is an ingenious idea (it may not be because the assumption of false sharing is wrong if the data is readonly or write-once, but burning three times as much L1 as necessary may very well put you at a disadvantage). To most of these assumptions, there is no clear-cut "right" or "wrong", it depends a lot on what you're doing with it, too.

There is the assumption that it's an ingenious idea to only post jobs to one's own queue and steal from another if no jobs are on yours.The article indeed says in verbatim that the producer only ever posts to its own queue. This means, by consequence, that there are never any normal dequeues from workers. Anything a worker does must finally always happen by stealing (because nothing is ever pushed onto any worker's queue, except if it pushes a task itself, but then it could just as well process the dependent job right away and avoid the overhead!).

With tongue-in-cheek, you could say that this design has the "best" of two worlds: Congestion at the single queue that actually contains jobs, and overhead for looking at the worker's own queue plus some other workers' queues first. You know, everybody pulling jobs from a single queue was what we wanted to avoid, to start with...

Edited by samoth, 21 March 2017 - 10:57 AM.

### #11richardurich  Members

Posted 21 March 2017 - 05:18 PM

Who cares? So you think the queue is shorter than it is. Either you bail thinking it's empty, or you take an element from the side you own and the changes to the other side have zero impact. This is the whole flipping reason we use this approach in lockless designs!

Maybe, or maybe not. Thinking that the queue is empty when it is not is sub-optimal, but it is no desaster. Now what about thinking that there is still room for pushing one more element when there isn't? Mind you, this is a bounded queue, so you'll be overwriting the oldest element. Sure... the queue filling up ideally shouldn't happen, but it can happen and it is something an implementation must be able to deal with, in a robust manner.

The scheduler is responsible for managing resources. It's kind of its job. If the scheduler is overflowing the queue, and you don't want that to happen, that's your fault for implementing it that way. Although it is also perfectly valid to accept undetermined results when that happens (I worked on software that accepted that result, because it forces a reboot, and there was already a memory overflow to get there) OP didn't give the scheduler code, so why are you just assuming it is wrong, and causing an error? And if code were given, and it were wrong, why not just help?

Yes, maybe. Or maybe it's perfectly safe unless there is exactly one element in the queue (suddenly "your" side is "theirs" as well, both indices point at the same object) and at that precise moment a worker which is not the main thread (or two of them) has nothing to do.

What are you talking about? The two sides aren't pointing at the same element if the size is one. If the scheduler is adding a second entry, it's not doing it in the same exact position it put the first element into.

Honestly, this is a legit lockless queue for single producer, single consumer. Google it. Most results are going to look nearly identical. I personally participated on a team that put this implementation into millions of homes. You're trying to find errors that just aren't there unless someone implemented it wrong.

Thing is, any kind of lockfree queue, but work stealing as described on the Molecular Musings blog in particular is highly complex (that's an understatement!), and any such thing as "correct" or "perfectly safe" shouldn't ever be said lightly (we know way too few details, having seen only a small fragment of code).

I was pretty explicit about not talking about the work stealing. I was pretty explicit that I doubt it is even useful. (It also seems worth noting that the linked article said "The implementation of the work-stealing queue uses locks.") OP hadn't implemented work stealing yet, just the normal lockless queue. And you asserted the existing code was making assumptions it was not making.

I want people to understand how these things work, so I felt it necessary to correct the misperception that a lockless queue doing separate reads of top and bottom is somehow not allowed. It is absolutely allowed, if done properly. There was nothing in the code provided to suggest it was not done properly, although we did not get all code.

Either way, what I'm trying to say is that the advantage of such a complex system may very well be debated, and the practical gains may quite possibly be marginal, if existent. The risk of getting a total fuckup with unexplainable (and unreproducable!) failures three weeks after shipping is, however, very real.

Then say that. Say what you mean. There is risk in every piece of code, and developers must choose what risks are worth taking. And it is completely okay if your project does not feel a specific risk is worth taking, just as it is completely okay if a different project decides the same risk is worth taking for their goals.

### #12Ryan_001  Prime Members

Posted 21 March 2017 - 07:24 PM

Someone explain to me what the point is with 'work stealing'?  Doesn't each thread in a threadpool just pick the next task/job in order from a shared queue.  Each task/job is not-parallelizable, so why move it from one thread to another?

### #13richardurich  Members

Posted 21 March 2017 - 08:18 PM

Someone explain to me what the point is with 'work stealing'?  Doesn't each thread in a threadpool just pick the next task/job in order from a shared queue.  Each task/job is not-parallelizable, so why move it from one thread to another?

In this case, each thread has its own job queue. Thread stealing seems to be a proposal to gain some of the benefits of a shared queue. I have no idea what proposed scenarios benefit from this approach over more traditional ones.

### #14Ryan_001  Prime Members

Posted 21 March 2017 - 09:09 PM

Someone explain to me what the point is with 'work stealing'?  Doesn't each thread in a threadpool just pick the next task/job in order from a shared queue.  Each task/job is not-parallelizable, so why move it from one thread to another?

In this case, each thread has its own job queue. Thread stealing seems to be a proposal to gain some of the benefits of a shared queue. I have no idea what proposed scenarios benefit from this approach over more traditional ones.

It seems you'd have to have a very intimate knowledge of the OS scheduler to gain any performance.  I'm all for esoteric solutions, and trying things simply for the sake of learning, but this seems a little extreme even for my tastes.

### #15Hodgman  Moderators

Posted 21 March 2017 - 10:26 PM

someone explain to me what the point is with 'work stealing'?  Doesn't each thread in a threadpool just pick the next task/job in order from a shared queue.  Each task/job is not-parallelizable, so why move it from one thread to another?

In this case, each thread has its own job queue. Thread stealing seems to be a proposal to gain some of the benefits of a shared queue. I have no idea what proposed scenarios benefit from this approach over more traditional ones.
It seems you'd have to have a very intimate knowledge of the OS scheduler to gain any performance.  I'm all for esoteric solutions, and trying things simply for the sake of learning, but this seems a little extreme even for my tastes.

Lock-free algorithms aren't necessarily fast. They're quite often slower than locking based algorithms Going lock free doesn't magically solve your data sharing performance issues.

A single, shared queue containing all jobs will be a highly contended resource, as all threads constantly push and pop work through it. The real performance cost in any multi-threaded code is contention. Two threads that both try to use the queue at close intervals in time will cause all sorts of havoc at the CPU cache level...

Having one queue per thread is a very simple method of reducing the amount of resource contention. Check your own queue first, and if it's empty, then check someone else's queue. Assuming that every thread is generating a lot of jobs, this means that most of the time, each thread will only work with its own queue, and there will be very little contention occurring.

Honestly, this is a legit lockless queue for single producer, single consumer. Google it.

He's hidden the code now, but push was not safe (it published work to the consumer before it was complete), and pop was very suspicious. We called it out with good reason

Edited by Hodgman, 21 March 2017 - 10:31 PM.

### #16samoth  Members

Posted 22 March 2017 - 05:30 AM

The thing is, and I apologize for repeating myself over and over again, the problem that work stealing is trying to solve is non-existent. Yes, it may be a very real problem in other parallel computing applications where tasks go over several physical CPUs on different physical machines (maybe with network in between), but almost certainly not here, in this case.

I mean, if this is for academic interest, fair enough. But if it is for something practical...

Also, the mentioned approach of work-stealing while only ever posting to one's own queue is somewhat nonsensical because it has the exact problem (non-existing problem) that work-stealing tries to solve. There will be (non-existing) contention on the "steal end" of the single queue that receives all the tasks. Because, you know, nothing is ever posted to the per-thread queues. Makes you wonder why they're there in the first place.

Compare that to other approaches like MoodyCamel's per-worker queue-of-queues, which indeed avoids contention for what it's worth. But this requires a quite a bit more evolved scheduler, and although it's an awesome piece of code, I've measured his approach with other approaches (including a simple mutex-queue) to exactly zero difference in a real application (it sure beats the devil out of other implementations in a synthetic benchmark, but we're not programming synthethic benchmarks, are we).

The reason for doing parallel computations is to get more bang out of the buck insofar as less wall time passes until the computation is finished (in CPU time, actually more time passes).

This works only if (at least) the following assumptions are valid:

1. Fork-join plus parallel execution takes less wall time than executing everything sequentially (I'll assume that "tasks can execute in parallel at all" is too obvious a precondition to be worth mentioning).
2. Tasks do not, on the average, appear faster than the pool of workers can process them. It may occasionally happen (that's why the queue has a size greater than 1), but not on the average, or persistently. If tasks persistently appeared faster than they can be processed, every task queue of any possible size would inevitably overflow, with nothing you can do to prevent that from happening.
3. Yet, tasks usually take longer (much longer) to process than to be pushed to and pulled from the queue. If this was not the case, why would you go through the whole ordeal in the fist place (unless you have an infinite number of CPU cores and infinite memory bandwidth, this doesn't gain you anything).

Combining the above kind of implies that the producer does not queue new tasks all the time without cease, but either sleeps/blocks in between at times, or does a different computation, or something. Blocking when the queue is full may be a valid option, or having a queue big enough for all tasks until the producer incidentially blocks on vertical sync anyway, or whatever.

It also implies that sampling randomly at any desired point in time, workers are much more likely to be found in "work" mode than in "pull" mode (because "work" takes so much longer than "pop"):

While doing a couple of atomic operations (or acquiring a lock) is not precisely a free operation, nor is pulling an invalid cacheline from RAM, it is still relatively fast, compared to executing a 5-6 digit amount of instructions in a task (things may be very different when e.g. making an IPC or network request, think e.g. ZMQ task ventilator where we talk about milliseconds, not dozens of nanoseconds).

That's why even locking is not such a big problem as people often expect it to be. Unless there is nothing to do, the workers spend >99.9% of their time processing tasks (well, of course! that is what you are interested in...), which in turn means that the likelihood for any thread to be trying to acquire the lock at any given time is very low. Which in turn implies "contention is not really an issue". Certainly, every now and then two threads will try to acquire the lock at the same time, or do a concurrent atomic operation. If that couldn't happen, then you wouldn't need to worry and could just scrap that whole thing. But once in a while, that's OK -- as long as it doesn't happen all the time, every time, you're good to go. No observable difference. If tasks are reasonably-sized, the odds are on your side.

If contention is a problem, you have, in my opinion, almost certainly done something wrong.

Edited by samoth, 22 March 2017 - 05:34 AM.

### #17Zipster  Members

Posted 22 March 2017 - 02:15 PM

The thing is, and I apologize for repeating myself over and over again, the problem that work stealing is trying to solve is non-existent.

No need to apologize, I think you hit the nail on the head, and it deserves repeating.

If queue contention is an issue, then instead of trying to fix the contention directly, first try and fix what causes the contention. Give your workers beefier tasks, so they aren't spending a disproportionate amount of time getting work, when they should be spending that time actually working. Then you can afford the overhead of a simple locking queue, for the 0.1% of the time threads spend getting work, for the even lesser amount of time that they might spend actually contending with each other.

If for whatever reason (in this case the sake of argument) you're stuck with lots of piddly tasks and can't do much about it, then I also agree that the approach as presented in the article makes little sense. Per-worker queues helps consumer contention for sure, but mitigating producer contention is the job of the task scheduler. The purpose of work stealing is to then deal with the sub-optimal load balancing that can result from the scheduler focusing on contention.

However the article seems to miss the mark here. It ignores the fact that each worker can actually create contention on its own queue by pushing tasks that other threads constantly have to steal (and no mention of handling overflow in these cases).  Even a naive round-robin strategy where each worker queue gets every i'th task could spread out that pressure some. It then focuses on work stealing as a solution to said contention (which it isn't). I have no doubt that the author's system works, because there isn't anything technically wrong with it. But it feels like a misunderstanding of the problem combined with a misapplication of solutions leading to an approach that isn't much better (if at all) than the original it was trying to improve upon.

Edited by Zipster, 22 March 2017 - 02:16 PM.

### #18Promit  Senior Moderators

Posted 22 March 2017 - 07:49 PM

I just want to highlight also that lock-free does not have anything to do with contention-free and indeed several lockfree algorithms will not perform well if heavy contention is a problem. What you want for that are wait-free algorithms/data structures, and those are generally only helpful if you have other work you can do before checking back to see if the contention has been resolved.

SlimDX | Shark Eaters for iOS | Ventspace Blog | Twitter | Proud supporter of diversity and inclusiveness in game development