Using pixel buffer objects with glReadPixels

Started by
8 comments, last by Kalidor 18 years, 4 months ago
Hi everyone, have a question regarding pixel buffer objects. I am running a P4 1.5 with a GeForce 5200 FX (AGP of course), and the 80+ Forceware drivers. I am trying to use PBO to increase the speed of glReadPixels. I've implemented a class for doing just that, and alternatively, doing a normal glReadPixels (for comparison). I am reading both the color and depth components. My normal glReadPixels looks like:

void readPixelsNormal()
{
  // init mem
  unsigned int * m_pBufferColor = new unsigned int[GetWidth() * GetHeight()]; 
  unsigned int * m_pBufferDepth = new unsigned int[GetWidth() * GetHeight()];

  glReadPixels(0, 0, GetWidth(), GetHeight(), GL_RGBA, GL_UNSIGNED_BYTE, 
               m_pBufferColor);
  //... do stuff to the pixels ...

  glReadPixels(0, 0, GetWidth(), GetHeight(), GL_DEPTH_COMPONENT,
               GL_UNSIGNED_INT, m_pBufferDepth);
  //... do stuff to the pixels ...
}

Getting about 30 FPS using the above approach. Now next is my PBO implementation:

// macro for pointing glReadPixels to ... well ... nowhere
#define BUFFER_OFFSET(i) ((char *)NULL + (i))

// PBO generated IDs
GLuint m_pPBO[2] = {0, 0};
unsigned int * m_pBuffer;

void initPBO()
{
  // init the PBOs
  glGenBuffersARB(2, m_pPBO);
  glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, m_pPBO[0]);
  glBufferDataARB(GL_PIXEL_PACK_BUFFER_EXT, 
                  (GetWidth() * GetHeight() * sizeof(unsigned int)), 
                  NULL, 
                  GL_STREAM_READ);

  glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, m_pPBO[1]);
  glBufferDataARB(GL_PIXEL_PACK_BUFFER_EXT, 
                  (GetWidth() * GetHeight() * sizeof(unsigned int)), 
                  NULL, 
                  GL_STREAM_READ);
  
  // bind it to nothing so other stuff doesn't
  // think it should use the PBOs
  glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, 0);
}

void readPixelsPBO()
{
  // bind buffer #1
  glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, m_pPBO[0]);

  // read pixels
  glReadPixels(0, 0, GetWidth(), GetHeight(), GL_RGBA,
               GL_UNSIGNED_BYTE, BUFFER_OFFSET(0));

  // map memory from card
  m_pBuffer = static_cast<unsigned int *>(glMapBufferARB(GL_PIXEL_PACK_BUFFER_EXT, GL_READ_ONLY_ARB));

  //... do stuff to pixels ...
  
  // unmap the memory
  if (!glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_EXT))
  {
    //  handle the error
  }

  // bind buffer #2
  glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, m_pPBO[1]);

  // read pixels
  glReadPixels(0, 0, GetWidth(), GetHeight(), GL_RGBA,
               GL_UNSIGNED_BYTE, BUFFER_OFFSET(0));

  // map memory from card
  m_pBuffer = static_cast<unsigned int *>(glMapBufferARB(GL_PIXEL_PACK_BUFFER_EXT, GL_READ_ONLY_ARB));

  //... do stuff to pixels ...

  // unmap the memory
  if (!glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_EXT))
  {
    //  handle the error
  }

  // bind it to nothing so other stuff doesn't
  // think it should use the PBOs
  glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, 0);
}

void killPBO()
{
  // kill the PBOs
  glDeleteBuffersARB(2, m_pPBO);
  glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, 0);  
}


It's weird. This second approach yields ~27 FPS. If I'm bypassing the normal readback pipeline, and directly accessing card memory, shouldn't I be getting some kick-butt framerate? In addition, since I'm using STREAM data, shouldn't the glReadPixels be returning immediately and behave asynchronously? Am I doing something wrong? What do you gurus think?
Advertisement
The glReadPixels should be returning immediately, but you are mapping the data right after it which needs to wait until all the data is there. What you should do is bind the first buffer and read into that, then bind the next buffer and read into that. Then you can bind the first buffer again and map that which will wait until the first glReadPixels is completed. Then you can do whatever you want to the data from the first glReadPixels while the second glReadPixels is completing. When you are done you unmap the first buffer, then map the second buffer and if the second glReadPixels isn't completed by that time it will wait until it is finished. Then you can do what you want to the data from the second glReadPixels. Then remember to unmap the second buffer as well.
 - bind first buffer - glReadPixels on first buffer - bind second buffer - glReadPixels on second buffer - bind first buffer and map it - use first buffer's data - unmap first buffer - bind second buffer and map it - use second buffer's data - unmap second buffer
EDIT: There is an asynchronous glReadPixels example in the GL_ARB_pixel_buffer_object spec. Go there and search for "Example 3"
Well, I gave it a try Kalidor, and there was 1 FPS improvement. The traditional glReadPixels still kicks its butt though.

I'd suspect that perhaps my video card wasn't performing well, BUT I also had someone try it on a P4 3.0 w/ a GeForce 6800 Ultra 512 MB, and glReadPixels still kicked PBO's hiney.

I've got to be doing something wrong.
Quote:Original post by Renaissanz
Well, I gave it a try Kalidor, and there was 1 FPS improvement. The traditional glReadPixels still kicks its butt though.

I'd suspect that perhaps my video card wasn't performing well, BUT I also had someone try it on a P4 3.0 w/ a GeForce 6800 Ultra 512 MB, and glReadPixels still kicked PBO's hiney.

I've got to be doing something wrong.
Hmm, I'm not sure then. I don't have too much experience with PBOs so I don't completely understand the ins-and-outs of using it, but doing it the way I described should at least be somewhat faster than the traditional way. Weird... Maybe someone else with more PBO experience will come along to help out. Good luck.
It is also possible that PBOs are not implemented in a performant way in the drivers yet (they still could be using the standard glReadPixels path, rather than performing copies asynchronously).

Regardless, Kalidor's suggestion is the proper way to approach this problem. The best way to improve on that approach is to go ahead and do some more work before MapBuffer:

ie:

glReadPixels into buffer 1
glReadPixels into buffer 2
// do something else for awhile
map buffer 1
map buffer 2

This gives the driver more time to perform an asynchronous copy to system memory.
RichardS: if I'm understanding you correctly:

Quote:
"This gives the driver more time to perform an asynchronous copy to system memory."

Then I'm not really getting direct memory access to the buffer?
Probably not. If your app wants to access data from a glReadPixels, you are still going to be doing a readback from the card to system memory. I may be wrong, but I don't believe that VRAM can be efficiently exposed for reading, at least when using AGP. You definitly can efficiently *write* directly to VRAM though.

By using PBOs in this way, you can start the memcpy, then go off and do other things while the data is DMA'd around, assuming the driver actually optimizes this. Without PBO, you'll block on the glReadPixels until 1) your scene finishes rendering, and 2) the copy can complete. This is essentially putting a glFinish() in the middle of your code.

However, PBOs merely extend the API such that the driver gains the flexility of optimizing this. It may not (yet). The spec doesn't require any particular performance charistics, it requires only correctness.

If the PBO route is only slightly slower, I would use it anyway, because it does allow the drivers a lot more room to maneuver in the future (even if they do not already).

That being said, I have no idea what they're doing today...
Well, I can understand what's been said about PBOs, but I have one burning question:

Why isn't PBO read back at least equal in performance to a regular glReadPixels?
That is a very good question.

It should be at least equal. You should file a bug with Nvidia.
Here is a good paper on using PBOs to get efficient pixel transfers. It may not help in increasing your performance too much since the previous suggestion didn't help, but it's still a good read and worth the few minutes it'll take.

This topic is closed to new replies.

Advertisement