Jump to content

  • Log In with Google      Sign In   
  • Create Account


Task Parallism - Gpu affinity


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
2 replies to this topic

#1 phr34k9   Members   -  Reputation: 152

Like
0Likes
Like

Posted 27 January 2012 - 02:28 AM

The past days I've been attempting to (re-)implement a task based scheduling implementation because current capabilities don't meet the ability to have certain tasks to have affinity to a sub-selection of processors or rather said threads (i.e. dispatching gpu tasks). The former scheduling algorithm supports LIFO task scheduling and implements the so called task-stealing paradigm.

Most of the rendering is constrained to opengl and this imposes a limit that gpu-centric tasks can only be dispatched to worker-threads have the context set. As far as I know the same context cannot be activated simultaneously from different threads, so this implies that all gpu-centric tasks have to be routed to the same thread (unless context sharing for multi-cpu resource allocation, but it still affects render state).

Task distribution with affinity certainly isn't impossible but from a small-scale prototype (using lockfree algorithms where available) I'm not really satisfied with all the contention it raises. Right now i have the impression it is more of an 'trial-and-rejection' solution i.e. which thread steals the task rather than a system that through heurestics imposes maximum processor utulisation.

Are there any particulair approaches/patterns that solves this problem that are somewhat populair?

Sponsor:

#2 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 27 January 2012 - 07:07 AM

OGL stuff needs to be done on same thread, simple as that.

Are there any particulair approaches/patterns that solves this problem that are somewhat populair?


If more than 1 CPU is available, assign one thread to do nothing but OGL, the rest may do anything.

If trying to squeeze a bit more out of it, ignoring potential temporal aliasing issues, then make OGL thread work something like this:
while (thread_running) {
  if (has_stuff_to_render) {
	render();
  } else {
	process_single_task();
  }
}
Downside of such approach is that if OGL thread dequeues a long task, it will impact rendering time. In some situations, it may cause aliasing effects (render, render, task, render, render, task, render, ...), such as frames having latency of 1, 2, 1, 2, 1, .... frames, causing visual stutter.

If feeling bold, then it's possible to reorganize the above in such a way that any thread may become OGL thread. Or, making each thread worker look something like this:
if (nobody_is_rendering && has_stuff_to_render) {
    become_OGL_thread();
    render();
  } else {
    process_task();
  }
}
This may cause certain complications if rendering requires some other API or OS functionality which may also be bound to a specific, usually creator's thread. Hence most prefer to just use main thread for that. This approach doesn't necessarily solve aliasing issues, something that cannot be done without ability to correctly interrupt a task being processed.

affinity


Affinity doesn't really do much on non-real-time OS. It merely adjusts some priorities, but doesn't solve fairness problem or maximize utilization.

#3 medv4380   Members   -  Reputation: 98

Like
0Likes
Like

Posted 27 January 2012 - 10:19 AM

A Phaser or Cyclic Barrier may help in what you are trying to do.

What I've done myself is create 1 Display thread for the OpenGL Context and N-1 Logic Threads. On my quad core that leaves me with 4 active threads running in parallel. The logic threads add their tasks for the next display frame and then swap their queue with another queue at the end of the cycle/phase. Basically the Display thread is always showing the frame the logic threads previously worked on. However, to do it right you have to double up on any memory variables that are shared since you don't want the Logic Threads changing the previous frames Display Variables.

You're going to come to a point where you'll have to decide if you want efficiency of processor use, or waist memory to increase processor utilization.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS