Separate DirectCompute context?

Started by
5 comments, last by Numsgil 11 years ago

Is it possible/kosher to create multiple device contexts in DirectX 11 with the purpose of using one for rendering and one for GPGPU?

Basically I'm working on a something sort of like a sim game. I want a render thread running as close to 60 FPS as possible, but I also want to offload a lot of the sim calculations to DirectCompute, and run it as fast as possible. That means the sim thread can run anywhere from 1 to 1000 FPS, depending on what's going on. The sim also needs to push data to the render thread eventually, but I don't necessarily mind that going out through the North Bridge to the CPU and back to the GPU (I don't mind a bit of latency from the sim to the renderer, as long as things stay responsive to the user).

[size=2]Darwinbots - [size=2]Artificial life simulation
Advertisement

Your GPU is actually using the same hardware in both style commands, so the submission of their command will not affect how the GPU is performing - rather it will only have an effect regarding the CPU usage when asking the GPU to do either rendering or compute.

The deferred contexts will only help you efficiently submit commands into a command list, which should in theory then help you to blast the commands to the driver really fast. However, if you are doing more with the GPU during your rendering and compute stages combined than what will allow 60 FPS, then using the deferred contexts won't help.

If you are CPU bound, then it is possible that using deferred contexts could help, but it would be something that depends on how you are doing your compute calls. Typically there isn't much CPU work to set up a GPGPU operation, so your mileage may vary...

I wasn't necessarily talking about deferred contexts. Deferred contexts seem like just a way to gather up commands from multiple threads. I was talking more about the possibility of creating two different immediate contexts. I noticed that when you have two different games running at the same time, for instance, they both get time on the GPU without the games needing to communicate. And that long DirectCompute tasks (like several seconds long), won't freeze the system like I've seen long OpenCL tasks do. But they do seem to freeze the executing program.

[size=2]Darwinbots - [size=2]Artificial life simulation

To be clear, there's an obvious way to get this working: build two separate processes that communicate with each other over TCP/IP. Each could have their own DirectX context, and the sim process could run at whatever framerate it wanted, and the rendering process could do its best to run at 60 FPS and grab updates from the sim process periodically. But that's a super heavy handed way to approach the problem, and I'm wondering if there's a better way.

[size=2]Darwinbots - [size=2]Artificial life simulation

You can't have multiple immediate contexts per device, you would have to create multiple devices (you can do this in a single process). You can definitely share resource data between two devices, you just have to handle the synchronization yourself using the DXGI sync primitives. Basically you create a resource on one device and specify the D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX flag, and then you pass the shared handle to OpenSharedResource to get an interface to the same resource on the other device. Then you use IDXGIKeyedMutex to synchronize any access to the shared resource.

I'm not 100% sure if this will give you the behavior you want (totally independent command streams with no implicit synchronization or dependencies), but I *think* it should work.

I wasn't necessarily talking about deferred contexts. Deferred contexts seem like just a way to gather up commands from multiple threads. I was talking more about the possibility of creating two different immediate contexts. I noticed that when you have two different games running at the same time, for instance, they both get time on the GPU without the games needing to communicate. And that long DirectCompute tasks (like several seconds long), won't freeze the system like I've seen long OpenCL tasks do. But they do seem to freeze the executing program.

Ahhh - ok, now I understand what you are trying to do... I've never heard of anyone doing that, but I suppose it would work. Using the method that MJP mentions above, you should be able to try it out fairly easily. And if I understand your intent properly, then you don't even need to synchronize over a shared resource since you would be copying the data via the CPU (where of course you would have to ensure that your threads are synchronized).

I guess from you asking this that your simulation task is very long running and can't be broken up into smaller pieces? Or is it a big burden to try to manage the timing of the calculations going on? It sounds interesting, and I would be happy to hear if your experiment works out.

Thanks MJP, that sounds doable and sounds like it'd give me what I want.

@Jason - The simulation scales with O(n^3) at the moment, though I'm hoping to get that down to O(n^2), for some definition of N :) I can break up the tasks well enough to avoid the Windows' watchdog restarting the driver, but trying to timeslice it with rendering tasks would be a huge pain. If N is small, I can run dozens of complete simulation cycles per render frame (think of something like Simcity in fast forward). And if N is large, it can take dozens of render frames for each simulation frame. Decoupling the two seems fairly obvious, though you're right, I've never heard of anyone trying to do this.

[size=2]Darwinbots - [size=2]Artificial life simulation

This topic is closed to new replies.

Advertisement