Most efficient ordering of thingies during frame.

Started by
12 comments, last by Staffan E 18 years ago
I've been trying to figure out what would be the best way to order various task during a single game frame. The alternatives I came up with: 1) Update and render objects in parallel. Intuitively this seems to be the best method, because the CPU & GPU would better cooperate. for each object i. Update object ii. DrawPrimitive() end foreach Present() 2) First update all logic, then render. i. Update all objects ii. Render all objects iii. Present() 2) doesn't look like as efficient as 1), but after a while I came up with 3), that IMO should be the most efficient method: 3) i. Update all objects ii. Present() // Presenting content rendered during LAST frame. iii. Render all objects To explain why I think 3) is better than 2), I drew a picture: Concluding, I think the most efficient is 3, then 1, then 2. But the actual problem is that trying to reorder the tasks and profiling didn't seem to effect the results very much (at all), and I think my test scene is not complex/realwordlike enough to test this. So I'd like to ask what's the best way to do this so that both CPU & GPU would get the most time they need, with the least chances of having the other to stall for the other to complete its tasks.
Advertisement
I've been curious about this myself recently but I haven't really had time to delve into it. The DX9 docs are not really abundant in info on how to maximize parallelism between the CPU and the GPU but I did find the following:

Quote:From the DX9 docs on IDirect3DDevice9::EndScene()
There should be at most one BeginScene/EndScene pair between any successive calls to present (either Present or Present). BeginScene should be called once before any rendering is performed, and EndScene should be called once after all rendering for a frame has been submitted to the runtime. To enable maximal parallelism between the CPU and the graphics accelerator, it is advantageous to call EndScene as far ahead of calling present as possible.


From this I gather that your 3) would be better since it maximizes the time between the EndScene() call and the following Present() call. I don't have a working project right now to experiment with so I can't try it out though.

Oh, and that's for DirectX. I don't know how to treat it in more general terms.
Hack my projects! Oh Yeah! Use an SVN client to check them out.BlockStacker
Well that's the exact line of the SDK that made me switch from 2 to 3. I did a profiling test to support this, but the results were a little contradictive. My test was roughly like the following:

int SleepMs = 0; // Runtime-alterable value

WndProc
{
FrameMove();

// 3) Present();
Render();
// 2) Present();
}

FrameMove
{
update all particles
do a null while loop for SleepMs milliseconds.
}

Render
{
Render the particle system.
}

In test 3, I presented right before render, and in test 2 I present after render. I don't have the exact values right now, but in general my FrameMove() was very fast, and almost all time went waiting for the Present(). I tried to alter the time spent in FrameMove() with the SleepMs variable and see if it would matter in either case. But surprisingly the result I got is that it didn't matter at all whether I rendered or presented first. That kinda says my diagram is wrong, but how/why?
If you had been getting worse performance with 3) then I would have been surprised. The result that the two methods don't differ doesn't say anything at all unless you get the same result for a number of different scenes. Obviously you would need an update function with significant time usage as well as enough render work to occupy the GPU during significant time each frame. Even if the Update() funciton takes some time to complete, this will not slow the system down if the GPU load is small enough to make the flip soon after the Present() call. You'd probably need to try this out on some scene where you have low FPS and a substantial update time each frame to see any effects at all.
Hack my projects! Oh Yeah! Use an SVN client to check them out.BlockStacker
I haven't used any graphic APIs in many months, so I'm wondering is it possible to see if D3D is ready to flip? If it's then I would divide my logic into smaller chunks and everytime one chunk completed I would check if I could flip, if I could flip then I would and after that I would return to updating the logic at the same point I was before. If updating logic finishes before flipping then I would just sit idle and wait for D3D to be ready (by calling Present).

EDIT: You could also just use triple-buffering if you have enough memory (this is possible with D3D right?).
Quote:Original post by staaf
I've been curious about this myself recently but I haven't really had time to delve into it. The DX9 docs are not really abundant in info on how to maximize parallelism between the CPU and the GPU but I did find the following:

Quote:From the DX9 docs on IDirect3DDevice9::EndScene()
There should be at most one BeginScene/EndScene pair between any successive calls to present (either Present or Present). BeginScene should be called once before any rendering is performed, and EndScene should be called once after all rendering for a frame has been submitted to the runtime. To enable maximal parallelism between the CPU and the graphics accelerator, it is advantageous to call EndScene as far ahead of calling present as possible.


From this I gather that your 3) would be better since it maximizes the time between the EndScene() call and the following Present() call. I don't have a working project right now to experiment with so I can't try it out though.

Oh, and that's for DirectX. I don't know how to treat it in more general terms.


While I won't contest the accuracy of the quoted material, don't you have to have Begin/Endscene pairs for rendering to an offscreen render target? If so, how could we have at most one b/escene pair between calls to present when doing effects like shadow mapping or reflections?

Edit: Okay, I just learned that you DON't need seperate b/escene pairs for RTs...

[Edited by - Cypher19 on April 15, 2006 9:53:41 AM]
Quote:Original post by Cypher19
Quote:Original post by staaf
...


While I won't contest the accuracy of the quoted material, don't you have to have Begin/Endscene pairs for rendering to an offscreen render target? If so, how could we have at most one b/escene pair between calls to present when doing effects like shadow mapping or reflections?


Not so, it seems. In my code, I have 1 begin/endscene and many SetRenderTarget calls in between (HDR stuff) and I have not needed to stop the scene and start a new one to change RT. I thought maybe MDX was doing something fancy for me, but pix backs me up :)

What device type are you using? Pure? Mine is just HVP.

Edit: Forget the main point of this post. Just what DOES Begin/EndScene do?
Ollie"It is better to ask some of the questions than to know all the answers." ~ James Thurber[ mdxinfo | An iridescent tentacle | Game design patterns ]
Quote:Original post by staaf
If you had been getting worse performance with 3) then I would have been surprised. The result that the two methods don't differ doesn't say anything at all unless you get the same result for a number of different scenes.


True, I only had this one particle scene to test with. And creating several scenes just to profile with would take too much time, hence posting here.

Quote:Original post by staaf
Obviously you would need an update function with significant time usage as well as enough render work to occupy the GPU during significant time each frame. Even if the Update() funciton takes some time to complete, this will not slow the system down if the GPU load is small enough to make the flip soon after the Present() call. You'd probably need to try this out on some scene where you have low FPS and a substantial update time each frame to see any effects at all.


Well, I varied the time to sleep in update(), and the scene had about 10-15fps, with Present() taking about 40ms, so the GPU load was very high and almost only load from CPU was the artificially introduced Sleep.

Quote:Original post by acid2
What device type are you using? Pure? Mine is just HVP.
Edit: Forget the main point of this post. Just what DOES Begin/EndScene do?

I had hardware vertex processing too. I don't exactly know what Begin/EndScene do, the SDK only says it prepares the device for rendering.. sounds vague.

Anyone has any idea what the "professional" games do? What's the most common way? Would there be any point using multithreading for this?

- clb
Quote:Original post by CTar
I haven't used any graphic APIs in many months, so I'm wondering is it possible to see if D3D is ready to flip?

Well, back when I was using DX7 and DirectDraw I seem to remember something about the presenting/flipping method returning an WASSTILLDRAWING error if the drawing had not finished when I tried to flip. Nowadays the Present() method stalls until the runtime is ready to make the flip. If I remember this correctly I really do hope they had good cause to stip that kind of functionality from the present routine.

Quote:Original post by CTar
EDIT: You could also just use triple-buffering if you have enough memory (this is possible with D3D right?).

Absolutely. But multiple backbuffering does not rule the benefits of parallelism between the CPU and GPU, although it may dampen the effect a little.

Quote:Original post by acid2
Just what DOES Begin/EndScene do?

It is not stated explicitly, but I guess BeginScene tells the runtime to be prepared for incoming requests and EndScene wraps up what has been issued during the scene block and dispatches it to the driver for GPU processing. Scince that would be where the actual drawing takes place it becomes reasonable that the EndScene-to-Present duration is as long as possible, waiting for the GPU to finish the drawing before requesting a flip.
Hack my projects! Oh Yeah! Use an SVN client to check them out.BlockStacker
Quote:Original post by clb
I've been trying to figure out what would be the best way to order various task during a single game frame.

The driver is buffering some frames before rendering. You certainly know the driver setting 'Max Frames to Render Ahead'. This buffer avoid such synchronisation bubbles.
So if you don't force any synchronisation (e.g. readbacks), don't be worry about it.

This topic is closed to new replies.

Advertisement