Jump to content
  • Advertisement
Sign in to follow this  
Dustin Hopper

Offscreen Rendering to FBO, then Texture Gives Increase to Performance... Why?

This topic is 2118 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Let's say for this example, I have a few standard meshes of around 200,000 polygons each.

Pseudo-pseudo for the old rendering pipeline goes:

[source lang="cpp"]void display()
{
// setup movement
// define lighting properties
// draw multiple meshes
// draw extra objects (2D UI, etc.)
// both are rendered using glDrawElements with gl***Pointer
// store depth buffer of FB at viewport inside array
}

void grabDepth(float *depth_array)
{
// you can assume the proper mutexes exist for this situation to work concurrently
// copy depth buffer from local array into depth_array
}[/source]
Using GPGPU resources, this worked pretty great. Recently, I've switched to rendering everything into separate FBO/RBO objects.

Pseudo-pseudo code for the new rendering pipeline goes:
[source lang="cpp"]void display()
{
// setup movement
// bind OBJECTS fbo
// define lighting properties
// draw multiple meshes
// bind extra objects fbo
// draw extra objects (2D UI, etc.)
// render combined FBO/RBO combos as texture on quad to screen
}

void grabDepth(float* depth_array)
{
// you can assume the proper mutexes exist for this situation to work concurrently
// just grab RB depth attachment and copy into depth_array
}[/source]
All data arrays are malloc'd and stored on the GPU. Nothing is moved or transferred to/through host.

I'm receiving a performance increase (speed increase and appearance is more crisp) in this situation just rendering to an offscreen FBO instead of direct. I can't figure out why this is. Anyone have any pointers or suggestions?

Share this post


Link to post
Share on other sites
Advertisement
This is very interesting. Maybe because the graphics driver isn't hitting the frame buffer with anti-aliasing with your new pipeline?

Share this post


Link to post
Share on other sites
What does [font=courier new,courier,monospace]grabDepth[/font] do, really? The answer to your question probably depends on how you're doing this in both cases. Edited by Hodgman

Share this post


Link to post
Share on other sites

This is very interesting. Maybe because the graphics driver isn't hitting the frame buffer with anti-aliasing with your new pipeline?


It is enabled, and possible, yes.



What does [font=courier new,courier,monospace]grabDepth[/font] do, really? The answer to your question probably depends on how you're doing this in both cases.


I doubt this. In fact, I've had to do a little more for the second round to make it possible.

Before:
[source lang="cpp"]void grabDepth(float *depth_array)
{
glBindBuffer(GL_PIXEL_PACK_BUFFER, depthPBO_);
glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_FLOAT, NULL);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
// use gpgpu api to memcpy array pointed to by depthPBO into depth_array
}[/source]
After:
[source lang="java"]void grabDepth(float *depth_array)
{
glBindFramebuffer(GL_READ_FRAMEBUFFER, depthFBO_);
glBindBuffer(GL_PIXEL_PACK_BUFFER, depthPBO_);
glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_FLOAT, NULL);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
// use gpgpu api to memcpy array pointed to by depthPBO into depth_array
glBindFramebuffer(GL_READ_FRAMEBUFFER, 0);
}[/source] Edited by trotlinebeercan

Share this post


Link to post
Share on other sites
Im not sure if this apply's but when I was reading CUDA PDF's, it said that the less info is transferred between the GPU and the host, the quicker the program will run cause of slow memory transfer between the PCI.

Share this post


Link to post
Share on other sites
I doubt this. In fact, I've had to do a little more for the second round to make it possible.
Don't assume based on the fact there's more calls in the 2nd version. Comment out both of the glReadPixels calls to test if it makes a difference.

All the details of how GL actually works are hidden by your driver, but to hypothesise, the driver could be acting like:
With the 1st one, the driver has to lock the default output buffer for the duration of the memory transfer, which stalls the next frame.
With the 2nd one, assuming that you clear your FBO, then the driver is free to allocate two storage areas for your texture, so that one can be locket for a memory transfer while the other is receiving the next frame's rendering.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!