Sign in to follow this  

Most efficient ordering of thingies during frame.

This topic is 4263 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been trying to figure out what would be the best way to order various task during a single game frame. The alternatives I came up with: 1) Update and render objects in parallel. Intuitively this seems to be the best method, because the CPU & GPU would better cooperate. for each object i. Update object ii. DrawPrimitive() end foreach Present() 2) First update all logic, then render. i. Update all objects ii. Render all objects iii. Present() 2) doesn't look like as efficient as 1), but after a while I came up with 3), that IMO should be the most efficient method: 3) i. Update all objects ii. Present() // Presenting content rendered during LAST frame. iii. Render all objects To explain why I think 3) is better than 2), I drew a picture: Concluding, I think the most efficient is 3, then 1, then 2. But the actual problem is that trying to reorder the tasks and profiling didn't seem to effect the results very much (at all), and I think my test scene is not complex/realwordlike enough to test this. So I'd like to ask what's the best way to do this so that both CPU & GPU would get the most time they need, with the least chances of having the other to stall for the other to complete its tasks.

Share this post


Link to post
Share on other sites
I've been curious about this myself recently but I haven't really had time to delve into it. The DX9 docs are not really abundant in info on how to maximize parallelism between the CPU and the GPU but I did find the following:

Quote:
From the DX9 docs on IDirect3DDevice9::EndScene()
There should be at most one BeginScene/EndScene pair between any successive calls to present (either Present or Present). BeginScene should be called once before any rendering is performed, and EndScene should be called once after all rendering for a frame has been submitted to the runtime. To enable maximal parallelism between the CPU and the graphics accelerator, it is advantageous to call EndScene as far ahead of calling present as possible.


From this I gather that your 3) would be better since it maximizes the time between the EndScene() call and the following Present() call. I don't have a working project right now to experiment with so I can't try it out though.

Oh, and that's for DirectX. I don't know how to treat it in more general terms.

Share this post


Link to post
Share on other sites
Well that's the exact line of the SDK that made me switch from 2 to 3. I did a profiling test to support this, but the results were a little contradictive. My test was roughly like the following:

int SleepMs = 0; // Runtime-alterable value

WndProc
{
FrameMove();

// 3) Present();
Render();
// 2) Present();
}

FrameMove
{
update all particles
do a null while loop for SleepMs milliseconds.
}

Render
{
Render the particle system.
}

In test 3, I presented right before render, and in test 2 I present after render. I don't have the exact values right now, but in general my FrameMove() was very fast, and almost all time went waiting for the Present(). I tried to alter the time spent in FrameMove() with the SleepMs variable and see if it would matter in either case. But surprisingly the result I got is that it didn't matter at all whether I rendered or presented first. That kinda says my diagram is wrong, but how/why?

Share this post


Link to post
Share on other sites
If you had been getting worse performance with 3) then I would have been surprised. The result that the two methods don't differ doesn't say anything at all unless you get the same result for a number of different scenes. Obviously you would need an update function with significant time usage as well as enough render work to occupy the GPU during significant time each frame. Even if the Update() funciton takes some time to complete, this will not slow the system down if the GPU load is small enough to make the flip soon after the Present() call. You'd probably need to try this out on some scene where you have low FPS and a substantial update time each frame to see any effects at all.

Share this post


Link to post
Share on other sites
I haven't used any graphic APIs in many months, so I'm wondering is it possible to see if D3D is ready to flip? If it's then I would divide my logic into smaller chunks and everytime one chunk completed I would check if I could flip, if I could flip then I would and after that I would return to updating the logic at the same point I was before. If updating logic finishes before flipping then I would just sit idle and wait for D3D to be ready (by calling Present).

EDIT: You could also just use triple-buffering if you have enough memory (this is possible with D3D right?).

Share this post


Link to post
Share on other sites
Quote:
Original post by staaf
I've been curious about this myself recently but I haven't really had time to delve into it. The DX9 docs are not really abundant in info on how to maximize parallelism between the CPU and the GPU but I did find the following:

Quote:
From the DX9 docs on IDirect3DDevice9::EndScene()
There should be at most one BeginScene/EndScene pair between any successive calls to present (either Present or Present). BeginScene should be called once before any rendering is performed, and EndScene should be called once after all rendering for a frame has been submitted to the runtime. To enable maximal parallelism between the CPU and the graphics accelerator, it is advantageous to call EndScene as far ahead of calling present as possible.


From this I gather that your 3) would be better since it maximizes the time between the EndScene() call and the following Present() call. I don't have a working project right now to experiment with so I can't try it out though.

Oh, and that's for DirectX. I don't know how to treat it in more general terms.


While I won't contest the accuracy of the quoted material, don't you have to have Begin/Endscene pairs for rendering to an offscreen render target? If so, how could we have at most one b/escene pair between calls to present when doing effects like shadow mapping or reflections?

Edit: Okay, I just learned that you DON't need seperate b/escene pairs for RTs...

[Edited by - Cypher19 on April 15, 2006 9:53:41 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Cypher19
Quote:
Original post by staaf
...


While I won't contest the accuracy of the quoted material, don't you have to have Begin/Endscene pairs for rendering to an offscreen render target? If so, how could we have at most one b/escene pair between calls to present when doing effects like shadow mapping or reflections?


Not so, it seems. In my code, I have 1 begin/endscene and many SetRenderTarget calls in between (HDR stuff) and I have not needed to stop the scene and start a new one to change RT. I thought maybe MDX was doing something fancy for me, but pix backs me up :)

What device type are you using? Pure? Mine is just HVP.

Edit: Forget the main point of this post. Just what DOES Begin/EndScene do?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by staaf
If you had been getting worse performance with 3) then I would have been surprised. The result that the two methods don't differ doesn't say anything at all unless you get the same result for a number of different scenes.


True, I only had this one particle scene to test with. And creating several scenes just to profile with would take too much time, hence posting here.

Quote:
Original post by staaf
Obviously you would need an update function with significant time usage as well as enough render work to occupy the GPU during significant time each frame. Even if the Update() funciton takes some time to complete, this will not slow the system down if the GPU load is small enough to make the flip soon after the Present() call. You'd probably need to try this out on some scene where you have low FPS and a substantial update time each frame to see any effects at all.


Well, I varied the time to sleep in update(), and the scene had about 10-15fps, with Present() taking about 40ms, so the GPU load was very high and almost only load from CPU was the artificially introduced Sleep.

Quote:
Original post by acid2
What device type are you using? Pure? Mine is just HVP.
Edit: Forget the main point of this post. Just what DOES Begin/EndScene do?

I had hardware vertex processing too. I don't exactly know what Begin/EndScene do, the SDK only says it prepares the device for rendering.. sounds vague.

Anyone has any idea what the "professional" games do? What's the most common way? Would there be any point using multithreading for this?

- clb

Share this post


Link to post
Share on other sites
Quote:
Original post by CTar
I haven't used any graphic APIs in many months, so I'm wondering is it possible to see if D3D is ready to flip?

Well, back when I was using DX7 and DirectDraw I seem to remember something about the presenting/flipping method returning an WASSTILLDRAWING error if the drawing had not finished when I tried to flip. Nowadays the Present() method stalls until the runtime is ready to make the flip. If I remember this correctly I really do hope they had good cause to stip that kind of functionality from the present routine.

Quote:
Original post by CTar
EDIT: You could also just use triple-buffering if you have enough memory (this is possible with D3D right?).

Absolutely. But multiple backbuffering does not rule the benefits of parallelism between the CPU and GPU, although it may dampen the effect a little.

Quote:
Original post by acid2
Just what DOES Begin/EndScene do?

It is not stated explicitly, but I guess BeginScene tells the runtime to be prepared for incoming requests and EndScene wraps up what has been issued during the scene block and dispatches it to the driver for GPU processing. Scince that would be where the actual drawing takes place it becomes reasonable that the EndScene-to-Present duration is as long as possible, waiting for the GPU to finish the drawing before requesting a flip.

Share this post


Link to post
Share on other sites
Quote:
Original post by clb
I've been trying to figure out what would be the best way to order various task during a single game frame.

The driver is buffering some frames before rendering. You certainly know the driver setting 'Max Frames to Render Ahead'. This buffer avoid such synchronisation bubbles.
So if you don't force any synchronisation (e.g. readbacks), don't be worry about it.

Share this post


Link to post
Share on other sites
Quote:
Original post by CV
Quote:
Original post by clb
I've been trying to figure out what would be the best way to order various task during a single game frame.

The driver is buffering some frames before rendering. You certainly know the driver setting 'Max Frames to Render Ahead'. This buffer avoid such synchronisation bubbles.
So if you don't force any synchronisation (e.g. readbacks), don't be worry about it.


Ah of course, triple buffering, that explains why my test model didn't behave according to the diagram. I tried and forced a back buffer lock right before Present(), which of course stalls until all rendering is complete. Then profiling again the results were just like expected. So, lessons learned:

1) 'Update, Render, Present' vs 'Update, Present, Render' doesn't matter much IF you have more than one back buffer in your swap chain (D3DPP.BackBufferCount > 1 ).

2) Present() returns immediately if there's a free back buffer in your swap chain available. If not, it stalls until one is flipped and freed.

3) Obviously, locking the back buffer is BAD. Not only it stalls until all rendering is complete, but it also makes the whole back buffer chain obsolete. The reason is that if you're locking back buffer each frame, you need to go in sync with the GPU and the extra back buffer chains are totally wasted!

Share this post


Link to post
Share on other sites
Quote:
Original post by clb
Ah of course, triple buffering, that explains why my test model didn't behave according to the diagram.

Triple buffering usually means having three framebuffers. A framebuffer takes place "between" the gpu and the ramdac. I mean a buffer inside the driver between the application and the gpu.


Share this post


Link to post
Share on other sites

The point is that calling EndScene() doesnt effectively stop the GPU. Its telling the Drivers/GPU to finish off all pending render requests (there can be a queue of requests pending in many cases)

While Present() in the tripple-buffering case means there is never (?) a stall at this specific point in time, that doesnt negate the fact that the GPU still isnt finished and simply cannot perform a bonified state change (altering render targets, changing streams, and so forth) until its done - the stall is only deferred until the next render request or state change.

I suggest taking microsofts advice, because it plays well with BOTH 2-buffer and 3-buffer strategies. Find some expensive CPU work and do it between EndScene() and Present() .. seems simple enough. Idealy the GPU finishes up the current frame precisely when the CPU issues Present()


Share this post


Link to post
Share on other sites
Quote:
Original post by Rockoon1
While Present() in the tripple-buffering case means there is never (?) a stall at this specific point in time, that doesnt negate the fact that the GPU still isnt finished and simply cannot perform a bonified state change (altering render targets, changing streams, and so forth) until its done - the stall is only deferred until the next render request or state change.

That's what I was trying to get at earlier but you put it in better words. The point is that a) using multiple backbuffers and b) trying to fill out the time between EndScene() and Present() with CPU work are two separate techniques that can enhance efficiency. They are neither mutually exclusive nor the same thing.

Share this post


Link to post
Share on other sites

This topic is 4263 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this