Sign in to follow this  
BenS1

GPU multitasking

Recommended Posts

Is it possible to get a GPU to do multiple different things at once?

For example, I've written a terrain engine that works fine on the GPU, and I've also written a procedural landscape generator which can quickly generate terrain heightmaps on the GPU... however these are currently 2 seperate applications, and I'd like to merge them into one.

What I mean is, I want my game to render the terrain in view but in the background the procedural landscape generator will generate new terrain segments as required if the player approaches new areas. The problem is that the landscape generation may take 1/10th of a second... which would obviously cause rendering to pause for 6 frames of so, resulting in a very noticeable hesitation/flicker.

What I was wondering was if there is a way to get the GPU to multitask/multithread. i.e. I could use say 80% of the GPU for normal game rendering and 20% for landscape generation in the background.

Thanks
Ben

Share this post


Link to post
Share on other sites
Ok thanks both

Its a bit of a shame really as there are quite a few things I could do ont he GPU, such as some of the AI tasks, however it'll be difficult to guarantee that these don't impact the actual rendering.

It would be nice if you could effectively run multiple different tasks on the GPU at the same time and maybe assign quotas to each. For example, you could assign 80% to the normal rendering stuff and 10% for AI and 10% for terrain generation. For example a Nvidia GTX460 has 336 "CUDA Cores", so you could assign 269 of them to rendering, and 33 for AI and 33 for terrain generation.

Oh well, I can dream, but in short the answer is "no, they can't multitask concurrently today".

Thanks
Ben

Share this post


Link to post
Share on other sites
If you use Compute Shaders you can assign 80% of your GPU threads to perform one algorithm and the other 20% to perform another. You can do this by dispatching as many threads as you need in total and then running a check in the shader on SV_DispatchThreadID to see whether FlattenedThreadID/TotalNumberofThreads is above, equal, or below 0.8, and thus branch your code accordingly.

Share this post


Link to post
Share on other sites
Unfortunately that won't work quite like that on all hardware.

NV's Fermi (and later) can do multiple compute shaders at once, with the latter ones taking any slack left over by the ones before it (as to if this requires the use of CUDA or if it "just works" regardless I don't know.. I would assume the latter however), however AMD's HD5 series GPUs CANT do that; they execute one set of compute shaders at a time and any slack in the processing is lost power.

AMD's HD6 series can do 'async dispatch', however I'm not sure how well supported this is in practise. It might allow the same setup as NV, although their solution goes further (you'll be able to dispatch from any thread, likely via an OpenCL extension first before DX 'catches up' to the hardware).

Another thing to keep in mind is the work load you are doing; it's all very well saying "use X SPUs" for things but the work load has to be big enough to hide data latency across those units and they have to fit a min wave/group size to take effective use of the hardware parrallelism. (AMD have some good OpenCL webinars/pdfs/ppts on the subject which are well worth a read when it comes to getting the most out of the GPU).

Share this post


Link to post
Share on other sites
[quote name='phantom' timestamp='1298036341' post='4775849']...[/quote]

I meant running one Compute Shader at a time. You can branch your code in a single shader in the way I described. With that branching you can for example devote 20% of the threads launched to one task, and 80% to another task.

Share this post


Link to post
Share on other sites
Ugh, really really bad idea; shaders run in lock step wavefront/weaves; unless you precisely align your compute shader block sizes to wave front/weave sizes for the hardware you are running on you are going to experiance problems with performance.

Even then it just doesn't strike me as a good idea to try and mix and match tasks; different data types, memory access and shader lenghts are going to cause problems with memory bandwidth, cache and just shader runtime unless ALL segments run for precisely the same amount of time.

So, yeah, really bad idea.

Share this post


Link to post
Share on other sites
[quote name='forsandifs' timestamp='1298041441' post='4775890']
[quote name='phantom' timestamp='1298036341' post='4775849']...[/quote]

I meant running one Compute Shader at a time. You can branch your code in a single shader in the way I described. With that branching you can for example devote 20% of the threads launched to one task, and 80% to another task.
[/quote]

So this will result in all cores running 100% of the written code for this shader in the most likely case. It would take precise tuning of thread counts to match hardware to get what you're hopping for, and even then some hardware may not do it. As previously mentioned, this would probably also suffer some memory bandwidth and caching problems.

Share this post


Link to post
Share on other sites
The OP asked if it is posible to get a GPU to multitask concurrently. I replied that it is possible with compute shaders. And that is true. It is possible. EDIT: I explained a way that makes it possible. (Phantom actually went to explain another way citing that certain GPUs can actually dispatch multiple Compute Shaders at once).

I was also going to say I didn't think the way I stated was a good idea for his intended purposes because I think its rarely good to have any kind of function doing two or more unrelated things, but I decided not to because I decided to limit my post to the above point. Given the subsequent replies I think it would have been a good thing to add that.

Having said that, there is one case where I would perhaps consider doing such a thing. The case where my project had used up all other available processing power and I had plenty of GPU power left to use and I couldn't cull any already implemented tasks and I absolutely had to implement that extra task.

Share this post


Link to post
Share on other sites
Ok thanks everyone.

So in summary I think I'll take it as being potentially technically possible but horribly inefficient and generally a bad idea.

Shame. Maybe they'll add proper support for it in future hardware and a new version of DirectX, but I wont hold my breath.

Thanks
Ben

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this