Jump to content

  • Log In with Google      Sign In   
  • Create Account


Why DirectX doesn't implement triple buffering?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 zz2   Members   -  Reputation: 265

Like
0Likes
Like

Posted 21 October 2013 - 04:26 AM

Is there any particular reason DirectX doesn't implement correct triple buffering?

It would solve screen tearing issue without introducing any additional latency (compared to double buffering)  and without dropping FPS like V-Sync with double buffering does.

 

The implementation I am talking about (page flipping method) is described here: http://www.anandtech.com/show/2794/4

 

 

Microsoft doesn't implement triple buffering in DirectX, they implement render ahead (from 0 to 8 frames with 3 being the default)...(aka a flip queue)

 

It would be a cheaper solution compared to Nvidia G-Sync. (G-Sync reduces lag even further compared to double buffering, but it requires monitor upgrade.)

 

So the question is why didn't MS implement this in DirectX as it seems that it would be a really simple thing to do, or am I missing something?



Sponsor:

#2 mhagain   Crossbones+   -  Reputation: 7833

Like
4Likes
Like

Posted 21 October 2013 - 05:01 AM

It does; the Anandtech article is basically bogus misinformation.

 

For D3D9 look at the BackBufferCount member of D3DPRESENT_PARAMETERS.

For D3D10+ look at the BufferCount member of DXGI_SWAP_CHAIN_DESC.

 

The render-ahead method that Anandtech are talking about is something completely different and is controlled by IDirect3DDevice9Ex::SetMaximumFrameLatency or IDXGIDevice1::SetMaximumFrameLatency - that's the API for the 0 to 8 frame render-ahead; the number of back buffers used is not the same thing.

 

It's more appropriate to ask "why are Anandtech spreading misinformation?" here.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#3 tonemgub   Members   -  Reputation: 916

Like
3Likes
Like

Posted 21 October 2013 - 05:47 AM

In addition to that:

 

From the article: "The major difference in the technique we've described here is the ability to drop frames when they are outdated."

 

You can do this in D3D11 by using the DXGI_PRESENT_RESTART flag when calling IDXGISwapChain::Present. And for some reason I can't remember, you also have to use DXGI_SWAP_EFFECT_SEQUENTIAL for the swap chain, or you will get lag.

 

And you can't get tearing-free rendering simply by using triple-buffering. The only way to get rid of tearing is to use VSYNC, but the article doesn't mention if they use VSYNC in their triple-buffering method.

 

I just tested these flags, with BufferCount 3 and no VSYNC - it still tears.


Edited by tonemgub, 21 October 2013 - 05:48 AM.


#4 zz2   Members   -  Reputation: 265

Like
0Likes
Like

Posted 21 October 2013 - 05:57 AM

So why don't DirectX games use triple buffering when v-sync is enabled?

 

If i understand correctly circular pattern of switching buffers in chain described on MSDN, circular flipping process introduces additional latency, which is not the method of bouncing between two back buffers (while front buffer is sent to monitor) that Anandtech described.

 

 

When a flipping chain contains a front buffer and more than one back buffer, the pointers are switched in a circular pattern, as shown in the following diagram.

IC511449.png

http://msdn.microsoft.com/en-us/library/windows/desktop/bb173393%28v=vs.85%29.aspx

http://msdn.microsoft.com/en-us/library/windows/hardware/ff570099%28v=vs.85%29.aspx

EDIT: v-sync should be enabled with triple buffering to prevent tearing

here is what I think are combinations:
- double buffering & no v-sync = screen tearing as front buffer and back buffer can be switched before monitor has finished displaying front buffer

- double buffering & v-sync = no tearing, drop in FPS as GPU has to wait for monitor to finish displaying

- triple buffering & v-sync (DirectX's circular flipping) = no tearing, introduces lag as GPU can complete draw on to both back buffers (and then wait), but buffers are send to monitor in fixed order (in that case old frame is displayed when there is already a completed new frame waiting)

- triple buffering & v-sync (with bouncing back buffers as Anandtech describes) = no tearing, no additional lag (if GPU completes drawing on both back buffers, it starts drawing new frame instantly & overrides older back buffer. So there is always only latest frame waiting to be send to monitor (flipped to front buffer))

 

From what I am reading on MSDN, DirectX does not implement last option or it is only DX11 that does (and why it has taken them so long to implement this)?


Edited by zz2, 21 October 2013 - 06:49 AM.


#5 mhagain   Crossbones+   -  Reputation: 7833

Like
1Likes
Like

Posted 21 October 2013 - 06:53 AM

So why don't DirectX games use triple buffering when v-sync is enabled?

 

If i understand correctly circular pattern of switching buffers in chain described on MSDN, circular flipping process introduces additional latency, which is not the method of bouncing between two back buffers (while front buffer is sent to monitor) that Anandtech described.

 

 

When a flipping chain contains a front buffer and more than one back buffer, the pointers are switched in a circular pattern, as shown in the following diagram.

IC511449.png

http://msdn.microsoft.com/en-us/library/windows/desktop/bb173393%28v=vs.85%29.aspx

http://msdn.microsoft.com/en-us/library/windows/hardware/ff570099%28v=vs.85%29.aspx

EDIT: v-sync should be enabled with triple buffering to prevent tearing

here is what I think are combinations:
- double buffering & no v-sync = screen tearing as front buffer and back buffer can be switched before monitor has finished displaying front buffer

- double buffering & v-sync = no tearing, drop in FPS as GPU has to wait for monitor to finish displaying

- triple buffering & v-sync (DirectX's circular flipping) = no tearing, introduces lag as GPU can complete draw on to both back buffers (and then wait), but buffers are send to monitor in fixed order (in that case old frame is displayed when there is already a completed new frame waiting)

- triple buffering & v-sync (with bouncing back buffers as Anandtech describes) = no tearing, no additional lag (if GPU completes drawing on both back buffers, it starts drawing new frame instantly & overrides older back buffer. So there is always only latest frame waiting to be send to monitor (flipped to front buffer))

 

From what I am reading on MSDN, DirectX does not implement last option.

 

What you're describing here is D3DSWAPEFFECT_FLIP and that's not the only swap effect available; D3D does expose other ways of handling the buffer swap operation, such as discard or copy.  Flip is described as follows in the documentation:

 

 

The swap chain might include multiple back buffers and is best envisaged as a circular queue that includes the front buffer. Within this queue, the back buffers are always numbered sequentially from 0 to (n - 1), where n is the number of back buffers, so that 0 denotes the least recently presented buffer. When Present is invoked, the queue is "rotated" so that the front buffer becomes back buffer (n - 1), while the back buffer 0 becomes the new front buffer.

 

And if you read further down the page you're referencing (in your first link), you'll see the following note: "The discussion above applies to the commonly used case of a full-screen swap chain created with D3DSWAPEFFECT_FLIP."  You can't really extrapolate this to a general behaviour as it's explicitly documented as only applying in one particular case.

 

Your second link refers to DirectDraw surfaces and as such is seriously out-dated information; DirectDraw surfaces used for buffer swapping haven't been relevant since D3D7.

 

At this point I'm really not sure what you're getting at.  You want to know why Microsoft made a decision (which they didn't actually make) - ask Microsoft.  You want to know why some games implement things a certain way - ask the developers of those games.  Otherwise your posts are reading to me as "a rant disguised as a question" I'm afraid.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#6 tonemgub   Members   -  Reputation: 916

Like
1Likes
Like

Posted 21 October 2013 - 07:09 AM

I don't know of any games that disable triple buffering when VSYNC is used.

 

I didn't cath on to that part of the article, but I don't see how "bouncing" two back buffers onto a third back buffer is any diffferent from sequencing two or three back buffers when it comes to performance. Buffering is just that: buffering. It has nothing to do with the way hardware presents frames to the screen.

 

So I second mhagain's opinion that the article is mostly bogus.

 

If GPU lag is your concern, then use DXGI_SWAP_EFFECT_SEQUENTIAL and DXGI_PRESENT_RESTART - it eliminates the GPU wait when a new frame needs to be presented and the frame queue is full. In this case, you don't even need three buffers - one is sufficient.

 

As for flipping (in contrast to blitting) backbuffers onto the screen - it was available in D3D9, then it was disabled in D3D10&11 and it was only brought back in D3D11.1. Only MS knows why. So the only latency that a D3D10/11 swapchain has is from the fact that they use blitting instead of flipping.



#7 zz2   Members   -  Reputation: 265

Like
1Likes
Like

Posted 21 October 2013 - 08:37 AM

I am sorry about outdated link, I didn't check category. It was actually in top two results on MSDN when I searched for triple buffering and at the end of the document it said "Build date: 10/14/2013" so I thought article is current.

 

I am still learning on the subject. And there is a lot of contradicting information and scarce official resources that would go into detailed comparisons between techniques. I am asking this from two perspectives: as an user and as a indie developer.

 

As an user I am trying to understand why so many games have problem, that with v-sync enabled framerate will be halved when fps is below monitors refresh rate. What is the reason they don't use triple buffering? Why do technology such as Adaptive V-Sync even exist then if there is triple buffering (is it purely a marketing gimmick)?

 

As I developer, I would first like to learn about this (pros and cons), why it is so challenging to get it right, before tying to implement it.

 

If GPU lag is your concern, then use DXGI_SWAP_EFFECT_SEQUENTIAL and DXGI_PRESENT_RESTART - it eliminates the GPU wait when a new frame needs to be presented and the frame queue is full. In this case, you don't even need three buffers - one is sufficient.

I though minimum of two buffers are needed. One front buffer and one back buffer. How would only one buffer work?

 

You want to know why Microsoft made a decision (which they didn't actually make) - ask Microsoft. You want to know why some games implement things a certain way - ask the developers of those games.

Considering I don't know anyone at Microsoft or any of those game developers is it wrong to ask for your insight on this topic on this forum? Yes I made some wrong assumptions and I am sorry about that. I don't know what else to say.

 

Can anyone explain in a little more detail how triple buffering is correctly and set up in DirectX (with v-sync and without additional lag) (preferably in DirectX 9 if possible). DXGI_PRESENT_RESTART is it then, but only for DX11?


Edited by zz2, 21 October 2013 - 08:43 AM.


#8 zz2   Members   -  Reputation: 265

Like
0Likes
Like

Posted 21 October 2013 - 09:45 AM

I don't see how "bouncing" two back buffers onto a third back buffer is any diffferent from sequencing two or three back buffers when it comes to performance. Buffering is just that: buffering. It has nothing to do with the way hardware presents frames to the screen.


The difference comes from synchronization timings I think. First buffer in sequence is front buffer. That means that buffer sequence cannot be flipped in circular fashion if monitor has not finished presenting front buffer. So rendering has to be halted in case when both back buffers are completed before monitor refresh is completed. With bouncing, only the two back buffers can be flipped, without affecting front buffer (same cannot be done when flipping all three buffers at once in circular fashion).
This is how I think it works. I may be wrong. Many users report increased mouse lag when using triple buffering in some games, I guess this circular flipping is the reason?

Edited by zz2, 21 October 2013 - 09:47 AM.


#9 Matias Goldberg   Crossbones+   -  Reputation: 3183

Like
1Likes
Like

Posted 21 October 2013 - 12:05 PM

Triple buffering exists for two reasons:
  • To avoid the FPS drop when VSync is enabled (unless both GPU & CPU take less than 16ms to render).
  • To let the CPU return immediately in case one frame was taking too long for the GPU to process.
In both cases, turning off VSync makes the triple buffering irrelevant:
  • When VSync is off and CPU/GPU took more than 16ms; the buffers are still flipped and tearing appears (unless the rendering time was a multiple of 16ms).
  • When VSync is off, the CPU still returns immediately from submitting the commands to the GPU; the screen is flipped and tearing happens.
Triple Buffer + No Vsync makes no sense (you can try it, but it won't change much); Triple Buffer + VSync should deliver similar performance to Double Buffer + No VSync; except there is no tearing but trading off a frame of latency. There is no way to avoid the additional frame lag.
 
What Anandtech is saying with its "alternative" is basically just a form of frame-skipping: When the CPU is too fast (or the GPU is too slow); the CPU may be issuing 4 frames, but the GPU is still presenting the front buffer, the 1st backbuffer is waiting to be presented, and the 2nd backbuffer is also waiting; and the CPU wants a 4th one.
So, instead of using quadruple buffering or waiting, the Anandtech suggest the method of dropping the contents of the 2nd backbuffer and replace it with the contents of that 4th frame.
So the monitor will end up showing frames 1, 2, 4; instead of showing 1, 2, 3 ... wait ... 4.
Forcing frame 4 come after 2 leads to stuttering.
 
This is a terrible technique, it only works for the cases when the 5th frame will be very CPU intensive thus dispatching the 4th now instead of waiting gives a lot of time to process that heavy 5th frame.
But it is pointless anyway because if you have such bad fps micro-spikes you have a bigger issue; and besides it means the CPU is sending frames faster than the monitor's refresh rate; which is pointless and only leads to stutter.
The other extreme is when the GPU is taking too long. But if the GPU takes too long all the time, then triple buffer won't help much; and you should be thinking of plain old frame skipping (regardless of buffer counts).
 
Another issue with the technique is that if the 4th frame is too GPU heavy, it won't be ready by the time it should be presented (that is, after frame 2), however had they waited, it could make it in time because there is one frame more in the queue (frame 3 comes after 2, then comes the 4th). Like I said, a terrible technique.
 
 
The Anandtech article says their technique is the real Triple Buffer; while DX implements a fake render ahead queue that shouldn't be called Triple Buffering. Well, the article is wrong. Their technique is not "Triple Buffer"; it's a form of useless frame-skipping that introduces stutter, increases CPU power consumption, but is still able to avoid tearing and has as much visible lag as the "render ahead" method.

Edited by Matias Goldberg, 21 October 2013 - 12:06 PM.


#10 Matias Goldberg   Crossbones+   -  Reputation: 3183

Like
0Likes
Like

Posted 21 October 2013 - 12:39 PM

Something that I forgot to mention, is that if we read the 2nd page of the article he achieves an internal render loop of 3.3ms per frame with his frame-skipping triple-buffer; thus minimizing latency (what the author wanted to achieve), no tearing (because of the VSync), but sacrificing stuttering (something the author forgets to mention).

 

The author compares the double buffering with the best case of his frame-skipping triple-buffer (when each frame takes exactly 3.33ms; which is a multiple of 16.66ms thus no stuttering will happen; also a frame time of 3.33ms is very hard to achieve in real games) without even comparing render-ahead triple-buffer.

 

The real way to achieve low latency input is to detach the logic & physics loop from graphics and put them in another thread. Thus input, logic & physics can run at 3.3ms while rendering happens at 16.66ms.

The method in the article just forces the GPU to internally draw at 3.33ms to achieve a similar effect.



#11 zz2   Members   -  Reputation: 265

Like
0Likes
Like

Posted 21 October 2013 - 03:44 PM

Thank you for such in depth response. How long is additional latency with triple buffering? Is it always one frame or is one frame lag worst case scenario?

What happens in his case:
Front buffer: frame 1
Back buffer 1: frame 2
Back buffer 2: gpu processing frame 3

--monitor refresh-- (circular flip)

Front buffer: frame 2
Back buffer 1: gpu processing frame 3
Back buffer 3: empty?

is second state correct?

Also wouldn't detached input, logic & physics loop that is running faster than graphic part give similar result as frame skipping?
Input, logic & physics:1,2,3,4,5,6,7,8,9,...
              Graphics:1 ,2 ,4 ,5 ,7, 8 ,...

So effectively graphics is skipping frames 3 & 6. How can this form of frame skipping be better than Anand's frame skipping? I know I am asking a lot of questions but please bear with me. I understand it can reduce input lag, but wouldn't skipping input,logic&physics frames produce the same stuttering effect as Anand's frame skipping?
Also now we are talking about synchronizing three loops:
1) Input, logic & physics
2) Graphics
3) Monitor refresh rate
 

the CPU may be issuing 4 frames, but the GPU is still presenting the front buffer, the 1st backbuffer is waiting to be presented, and the 2nd backbuffer is also waiting; and the CPU wants a 4th one.
So, instead of using quadruple buffering or waiting, the Anandtech suggest the method of dropping the contents of the 2nd backbuffer and replace it with the contents of that 4th frame.

One mistake you made here I think. I think Anandtech suggest the method of dropping the contents of the 1st backbuffer (because it has the oldest frame) and start calculating/writing contents of that 4th frame in it. So the monitor will end up showing frames 1, 3, 4. And it is the GPU that must be ready for the 4th frame for that to happen, not just CPU. As we are focusing on synchronizing graphics and monitor refresh rate. By dropping one frame we have given GPU more time to process 4th frame right?


Edited by zz2, 21 October 2013 - 03:56 PM.


#12 Matias Goldberg   Crossbones+   -  Reputation: 3183

Like
1Likes
Like

Posted 21 October 2013 - 08:46 PM

Thank you for such in depth response. How long is additional latency with triple buffering? Is it always one frame or is one frame lag worst case scenario?

What happens in his case:
Front buffer: frame 1
Back buffer 1: frame 2
Back buffer 2: gpu processing frame 3

--monitor refresh-- (circular flip)

Front buffer: frame 2
Back buffer 2: gpu processing frame 3
Back buffer 1: empty?

is second state correct?

I've corrected the numbers in bold for you. (After the circular flip, back buffer #1 becomes #2 and #2 becomes #1, assuming they're using the swap technique)
And yes that's correct. However "empty" is inaccurate. A better term is "undefined". Most likely it will contain the contents it had before swapping, but this isn't guaranteed IIRC. The data could've been corrupted by now. Or the Driver could for some random reason use an unused chunk of VRAM from some other region and discarded the old one. This is specially sensitive when dealing with more rare setups (i.e. SLI, CrossFire)

If the driver is using copy instead of flip/swap; the best guess is that buffers #1 & #2 now hold the same data (because #2 was copied into #1).
 

Also wouldn't detached input, logic & physics loop that is running faster than graphic part give similar result as frame skipping?

Short answer: mostly yes.
Longer answer: You're the one in control of what graphic logic state you will be rendering, so there is a finer granularity. But the actual reason is that rendering tends to take a lot of time (it is very rare to spend just 3.33ms; few games take 16.33ms/60hz while most need 33.33ms/30hz!) so if the CPU can run at 120hz, you will be limited to just 30hz, regardless of triple buffer or VSync.
By detaching, you can process at 120hz or more while still render at 60 or 30hz. This is covered in the excellent article fix your timestep. You're looking at 30hz updates, but the game "feels" responsive (because complex key inputs are processed almost immediately).
It's like running towards a cliff to get to the other side with your eyes closed. Just because your eyes are closed doesn't mean you have to wait until they're open again to jump. If you've calculated the distance well enough, press the space bar to jump at the right time.
With graphics locked to input & logic, your "space bar" jump could be processed either too early or too late.

In other words, the reason is the same as Anandtech's article (reduce latency); it's just the article makes a very optimistic assumption about how long takes a GPU to render the frame (unless you're playing a very old game where current GPUs can handle) and how stable that framerate is.
Aaaand that's why G-Sync is a cool gadget by the way.

 

One mistake you made here I think. I think Anandtech suggest the method of dropping the contents of the 1st backbuffer (because it has the oldest frame) and start calculating/writing contents of that 4th frame in it. So the monitor will end up showing frames 1, 3, 4. And it is the GPU that must be ready for the 4th frame for that to happen, not just CPU. As we are focusing on synchronizing graphics and monitor refresh rate. By dropping one frame we have given GPU more time to process 4th frame right?

Yes, my mistake.

#13 zz2   Members   -  Reputation: 265

Like
2Likes
Like

Posted 22 October 2013 - 01:32 AM

I think you forgot about front buffer in circular flip example. Front buffer becomes back buffer 2, back buffer 1 becomes front buffer and back buffer 2 becomes back buffer 1. That is for the case of circular flip technique where only the pointers (or names/purpose of buffers) change, but actual content stays in the same spot in memory.

 

I think I understand know.

  • Triple buffering is meant to be used with V-Sync & the reason to use it is to prevent frame drop (like with double buffering) when V-Sync is enabled.
  • Triple buffering can increase lag up to one frame (one frame of monitors refresh rate right? ... so 120hz monitors can be an improvement when triple buffering is used?)
  • Anand's technique would only be improvement (reduce lag) compared to circular flip technique in old games where GPU is rendering quicker than monitors refresh rate. With slow GPU and low FPS, there would not be any notable difference between techniques (for both there will be lag of up to one frame of monitor refresh rate). A much better technique to reduce lag is to separate input,logic&physics loop from graphics loop and move them to separated threads, so the game's input stays responsive even when the frame rate drops below monitors refresh rate (useful even when not using triple buffering).
  • The title of this thread is wrong hehe


#14 tonemgub   Members   -  Reputation: 916

Like
0Likes
Like

Posted 22 October 2013 - 12:26 PM


I though minimum of two buffers are needed. One front buffer and one back buffer. How would only one buffer work?

I know that, at least in blitting/copy mode, there is a minimum of only one back-buffer. There's no front buffer, or the front buffer is whatever video memory D3D maps internally to the screen.

One thing is for sure: when you create the swap chain for a D3D device, you olny give it the count of back-buffers to create, and you can always set that to 1, which is valid, and this is what I was referring to when I said you need only one buffer. But I also think that maybe there are restrictions on using a swapchain with only one back buffer with the "flip" presentation mode - maybe it requires the back buffer count from the swap chain to be at least 2, or maybe it creates a front buffer internally and uses it for flipping with the one back buffer you asked it to create for the swap chain - I don't know, but I'm sure it's documented somewhere how it works. I always thought of the "front buffer" as an internal buffer that only the D3D API knows and cares about, or maybe it's even only internal to the video driver (I think it's the same thing as that elusive part of video memory that linux/OpenGL users like to call a "framebuffer").

 

There's also a comment about this at the bottom of the msdn article: http://msdn.microsoft.com/en-us/library/windows/desktop/bb173075(v=vs.85).aspx

 

Great! Now I'm confused too. jk smile.png Anyway, I'm happy to know that my swap chains work with just one buffer (BackBufferCount=1) in copy presentation mode (but D3D11 complains when I give it a BackbufferCount of 0 or less)... Don't know what it accepts for the flipping presentation mode.

 

In fact, you're lucky, because the D3D9 documentation seems to be more clear on what the BackBufferCount can be: http://msdn.microsoft.com/en-us/library/windows/desktop/bb172588%28v=vs.85%29.aspx


Edited by tonemgub, 22 October 2013 - 12:52 PM.


#15 mhagain   Crossbones+   -  Reputation: 7833

Like
0Likes
Like

Posted 22 October 2013 - 12:53 PM

The DXGI_SWAP_CHAIN_DESC documentation makes it implicit that - in windowed modes at least - D3D certainly does support rotating back-buffers.

 

 

in windowed mode, the desktop is the front buffer

 

If you think about this, it's obvious.  Your front buffer (i.e. the desktop itself) is just not going to be the same size as your back buffer(s), so there's absolutely no way that the front buffer (desktop) is going to be able to be swapped with a back buffer.

 

You'll see the same in D3D9 if you use IDirect3DDevice9::GetFrontBufferData - the surface data you retrieve will be sized to the desktop, not to your back buffer size.  Again, this is documented behaviour:

 

 the size of the destination surface should be the size of the desktop


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#16 zz2   Members   -  Reputation: 265

Like
1Likes
Like

Posted 22 October 2013 - 04:40 PM

In case anyone else is interested here are some good articles I found while researching this subject.

 

Two buffers is minimum. One front buffer and one back buffer. Front buffers is on-screen buffer to which we cannot write. Back buffers are off-screen surfaces to which we draw. When creating swap chain we only specify back buffer count (front buffer always exist in one form or another. It may not be same size as back buffers if you game is in windowed mode. You do not have direct control over front buffer in D3D9).

Front buffer: A rectangle of memory that is translated by the graphics adapter and displayed on the monitor or other output device.

...

The front buffer is not directly exposed in Direct3D 9. As a result, applications cannot lock or render to the front buffer.

http://msdn.microsoft.com/en-us/library/windows/desktop/bb174607%28v=vs.85%29.aspx

Note that any surface other than the front buffer is called an off-screen surface because it is never directly viewed by the monitor. By using a back buffer, an application has the freedom to render a scene whenever the system is idle ... without having to consider the monitor's refresh rate. Back buffering brings an additional complication of how and when to move the back buffer to the front buffer.

http://msdn.microsoft.com/en-us/library/windows/desktop/bb153350%28v=vs.85%29.aspx

 

 

MSDN article on OpenGL (may be old but still relevant) that has clear explanation of relationship between terms: framebuffer, front buffer, back buffer

The framebuffer consists of a set of logical buffers: color, depth, accumulation, and stencil buffers. The color buffer itself consists of a set of logical buffers; this set can include a front-left, a front-right, a back-left, a back-right, and some number of auxiliary buffers.

...
By default, drawing commands are directed to the back buffer (the off-screen buffer), while the front buffer is displayed on the screen.

http://msdn.microsoft.com/en-us/library/windows/desktop/dd318339%28v=vs.85%29.aspx

 

More in depth article that describes differences in windowed and full screen mode (DX10):

When DXGI makes the transition to full-screen mode, it attempts to exploit a flip operation in order to reduce bandwidth and gain vertical-sync synchronization. The following conditions can prevent the use of a flip operation:

  • The application did not re-allocate its back buffers in a way that they match the primary surface.

  • The driver specified that it will not scan-out the back buffer (for example, because the back buffer is rotated or is MSAA).

  • The application specified that it cannot accept the Direct3D runtime discarding of the back buffer's contents and requested only one buffer (total) in the chain. (In this case, DXGI allocates a back surface and a primary surface; however, DXGI uses the driver's PresentDXGI function with the Blt flag set.)

http://msdn.microsoft.com/en-us/library/windows/hardware/ff557525%28v=vs.85%29.aspx

 

Another interesting thing is, it seems that driver can create additional buffers on its own.

However, drivers are notorious for adding more buffering of their own. This is an unfortunate side effect of benchmark tools such as 3DMark. The more aggressively you buffer, the more you can smooth out perf variations, and the more parallelism can be guaranteed, so you achieve better overall throughput and visual smoothness. But this obviously messes up input response time, so it's not a good optimization for anything more interactive than a movie player! Unfortunately, though, there is no way for automated benchmarks like 3DMark to test input response times, so drivers tend to over-index on maximizing their benchmark scores at the cost of more realistic user scenarios.

This was particularly bad in the DX8 era, where drivers would sometimes buffer up 5, 6 or more frames, and games resorted to crazy tricks like drawing to a 1x1 rendertarget every frame, then calling GetData on it at the start of the next frame, in an attempt to force the driver to flush its buffers. I haven't seen any drivers with that kind of pathological behavior for a long time, though.

http://xboxforums.create.msdn.com/forums/p/58428/358113.aspx#358113


Edited by zz2, 23 October 2013 - 02:26 AM.


#17 tonemgub   Members   -  Reputation: 916

Like
0Likes
Like

Posted 23 October 2013 - 05:28 AM

You da man!cool.png

 

 


Unfortunately, though, there is no way for automated benchmarks like 3DMark to test input response times

I have to disagree with this: Input repsonse time can be calculated as current time minus the input-event timestamp that is reported by functions like GetMessageTime. And if it is used after a call timeBeginPeriod(1), the response time will be accurate down to 1 millisecond.

Well, after re-thinking about it does make sense: there is no way for 3DMark to simulate physical input events.


Edited by tonemgub, 23 October 2013 - 05:43 AM.


#18 wintertime   Members   -  Reputation: 1647

Like
1Likes
Like

Posted 27 October 2013 - 02:11 PM

There are ways to measure this. Just have a camera and then count the frames between you hitting the button and something appearing on screen:

http://cowboyprogramming.com/2008/05/27/programming-responsiveness/

http://cowboyprogramming.com/2008/05/30/measuring-responsiveness-in-video-games/

http://cowboyprogramming.com/2008/12/03/custom-responsiveness-measuring-device/

 

What I wondered about when reading the posts above(sorry I'm late as I dont regularly read this subforum), especially the postings from Matias Goldberg is the following:

Couldn't it be possible to have an intermediate method, where the gpu does not always wait when the backbuffers are in use, but also not throws away the half-rendered frame 3 to get a free buffer?

It could wait until the earlier of two things:

- until frame 3 is rendered completely but frame 2 is still queued for the next vsync and then cycle only the backbuffers to throw away frame 2 and queue the ready and newer frame 3 for next vsync or

- until the vsync frees a buffer when it switches from frame 1 to frame 2 by cycling front and back buffers at once.

Depending on circumstances that would reduce latency for frame 3 by not showing a stale frame 2 first and could allow for unblocking cpu and gpu for starting rendering of frame 4 earlier, which maybe allows frame 4 or 5 to show up even another one vsync earlier.

That would trade a faster latency on triple buffering with vsync for a more complicated driver logic and higher cpu/gpu use when rendering is already going fast, I think.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS