I personally use my "main thread" as a worker thread too. Whenever any of my threads has nothing to do (e.g. it has to wait for the results of another thread before it can continue), then they make themselves useful by popping jobs from the job queue and doing some work. I basically have a "WaitFor..." busy loop, that continually checks if the condition has been met to exit, else tries to run a job, else after enough tries with no jobs in the queue it yields or sleeps.
Ah, that's a good point. I could allow the I/O thread to do a work job as well if there are no I/O jobs pending.
Regarding embarrassingly parallel jobs -- in order to move these off to the GPU, you also need the consumers of those jobs to be ok with extremely long latencies. It's not possible to get short CPU->GPU->CPU latencies on PC without destroying overall performance.
There aren't going to be many (if any at all) circumstances where a CPU step depends on a single GPU step, no. I want to keep what I calculate on the GPU on the GPU, so to speak. The only instance I can think of is using the CPU to do narrowphase CCD after the GPU does broadphase pruning for collision detection (ideally, I'd do the narrowphase on the GPU as well, but I can't think of any good GPU-friendly CCD algorithm),
Also, don’t “spawn” threads, awaken them. They should already exist and just be idling in a waiting state, waiting for an event to set them in motion.And a “wait” state is not a “sleep” state.
Whoops, mixing up my terminology here. Yes, I plan to have these worker threads always running and waiting for jobs to be queued, not spawned with the jobs themselves.