Hey
so after a while I need some input too now for my game engine. I want to implement a job system using task parallelism and work stealing to achieve an arbitary share of work over all system cores. My system uses a lock free Queue to store each task at a specific worker (that may or may not be on a specific CPU core). When a task is pushed to the system, it is added to calling threads queue and the job system alsow tries to wake up the owner or an additional thread if possible. A task is dequeued from the queues owner and executed if successfull, otherwise the worker looks at a random task and tries to get work from that. The worker is going to sleep if all of the above fails to adjust the workload of the engine in the running system.
So now I have the atomic queue that works well for now but I want to implement some kind of job stealing to it where I'm not quite sure how to access the queue from another thread. Using dequeue works as expected and investigating my 12 worker (have hexa core CPU) at least one has the test task as element of its queue but N of them may have had it already so this seems ok to me.
My general question is how is job stealing implemented on the queue side; should exist an other function in the queue that lets another thread steel a task or is a simple dequeue right ok for it? Or is the real work done at the managing side that handles the worker sharing to each other and of course the art of the state?
The background is that I saw some implementation where the queue was more kind of a stac for the owning thread so push and pop was handled at the owner side and dequeue on the thieves side. I saw a few problems in that implementation where running continious tasks like the messaging system would always be on top of the list and always run on certain worker when there would be other continious tasks could stuck the whole job system by continious tasks running on each worker and so never ever getting other tasks to run even if they could be stolen. So I decided to go for a traditional queue that would garantuee that each task regardless if continious or not would get its runtime.
And yes, I'm not using std::atomic nor anything else from the C++11 toolbox rather than a platform wrapper to some interloked/__sync_XX functions so dont advise any C++11 related gimmicks here please
Thanks in advanced