Offscreen Rendering to FBO, then Texture Gives Increase to Performance... Why?

Graphics and GPU Programming Programming

Started by Dustin Hopper August 29, 2012 03:33 AM

4 comments, last by Hodgman 11 years, 7 months ago

186

Author

August 29, 2012 03:33 AM

Let's say for this example, I have a few standard meshes of around 200,000 polygons each.

Pseudo-pseudo for the old rendering pipeline goes:

[source lang="cpp"]void display()
{
// setup movement
// define lighting properties
// draw multiple meshes
// draw extra objects (2D UI, etc.)
// both are rendered using glDrawElements with gl***Pointer
// store depth buffer of FB at viewport inside array
}

void grabDepth(float *depth_array)
{
// you can assume the proper mutexes exist for this situation to work concurrently
// copy depth buffer from local array into depth_array
}[/source]
Using GPGPU resources, this worked pretty great. Recently, I've switched to rendering everything into separate FBO/RBO objects.

Pseudo-pseudo code for the new rendering pipeline goes:
[source lang="cpp"]void display()
{
// setup movement
// bind OBJECTS fbo
// define lighting properties
// draw multiple meshes
// bind extra objects fbo
// draw extra objects (2D UI, etc.)
// render combined FBO/RBO combos as texture on quad to screen
}

void grabDepth(float* depth_array)
{
// you can assume the proper mutexes exist for this situation to work concurrently
// just grab RB depth attachment and copy into depth_array
}[/source]
All data arrays are malloc'd and stored on the GPU. Nothing is moved or transferred to/through host.

I'm receiving a performance increase (speed increase and appearance is more crisp) in this situation just rendering to an offscreen FBO instead of direct. I can't figure out why this is. Anyone have any pointers or suggestions?

[size=2]hopper.dustin@gmail.com

web383

804

August 29, 2012 04:12 PM

This is very interesting. Maybe because the graphics driver isn't hitting the frame buffer with anti-aliasing with your new pipeline?

Hodgman

52,717

August 29, 2012 05:02 PM

What does [font=courier new,courier,monospace]grabDepth[/font] do, really? The answer to your question probably depends on how you're doing this in both cases.

. 22 Racing Series .

Dustin Hopper

186

Author

August 29, 2012 09:03 PM

This is very interesting. Maybe because the graphics driver isn't hitting the frame buffer with anti-aliasing with your new pipeline?

It is enabled, and possible, yes.

What does [font=courier new,courier,monospace]grabDepth[/font] do, really? The answer to your question probably depends on how you're doing this in both cases.

I doubt this. In fact, I've had to do a little more for the second round to make it possible.

Before:
[source lang="cpp"]void grabDepth(float *depth_array)
{
glBindBuffer(GL_PIXEL_PACK_BUFFER, depthPBO_);
glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_FLOAT, NULL);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
// use gpgpu api to memcpy array pointed to by depthPBO into depth_array
}[/source]
After:
[source lang="java"]void grabDepth(float *depth_array)
{
glBindFramebuffer(GL_READ_FRAMEBUFFER, depthFBO_);
glBindBuffer(GL_PIXEL_PACK_BUFFER, depthPBO_);
glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_FLOAT, NULL);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
// use gpgpu api to memcpy array pointed to by depthPBO into depth_array
glBindFramebuffer(GL_READ_FRAMEBUFFER, 0);
}[/source]

[size=2]hopper.dustin@gmail.com

D.V.D

1,032

August 30, 2012 01:44 AM

Im not sure if this apply's but when I was reading CUDA PDF's, it said that the less info is transferred between the GPU and the host, the quicker the program will run cause of slow memory transfer between the PCI.

Hodgman

52,717

August 30, 2012 02:55 AM

I doubt this. In fact, I've had to do a little more for the second round to make it possible.

Don't assume based on the fact there's more calls in the 2nd version. Comment out both of the glReadPixels calls to test if it makes a difference.

All the details of how GL actually works are hidden by your driver, but to hypothesise, the driver could be acting like:
With the 1st one, the driver has to lock the default output buffer for the duration of the memory transfer, which stalls the next frame.
With the 2nd one, assuming that you clear your FBO, then the driver is free to allocate two storage areas for your texture, so that one can be locket for a memory transfer while the other is receiving the next frame's rendering.

. 22 Racing Series .

Offscreen Rendering to FBO, then Texture Gives Increase to Performance... Why?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Offscreen Rendering to FBO, then Texture Gives Increase to Performance... Why?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines