Sign in to follow this  

Max & Min in shader

This topic is 4400 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Are there any efficient methods for retrieving the max and min values being rendered to the framebuffer in a shader? From discussions here on the forums reading all values from the framebuffer doesn't seem so efficient. The values are going to be used for color-normalization.

Share this post


Link to post
Share on other sites
We've looked into that, and glGetMinmax seems to only check pixels being drawn with glDrawPixels. If we're going to do that we need to do a framebuffer readback, and that's exactly what we don't want to do.

Share this post


Link to post
Share on other sites
Never used gl minmax but you can found more here: http://www.gamedev.net/community/forums/topic.asp?topic_id=293968

Well since shaders are per vertex or per pixel based then you have no other option than reading back some value. However you can read async (ARB_pixel_buffer_object) to speed up.

Share this post


Link to post
Share on other sites
Yep, we found that thread too, but it didn't return any pixelvalues if we're not using glDrawPixels.

Using a traditional framebuffer readback kills our fps with 50% so the async read looks interesting =)

This is what our code looks like now, however, it doesn't work at all. We get a null-pointer exception:

----

GLuint imageBuffers[2];
glGenBuffers(2, imageBuffers);

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[0]);
glBufferData(GL_PIXEL_PACK_BUFFER_ARB, 512 / 2 * sizeof(float), NULL, GL_STREAM_READ);

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[1]);
glBufferData(GL_PIXEL_PACK_BUFFER_ARB, 512 / 2 * sizeof(float), NULL, GL_STREAM_READ);

drawSomething();

//Yep, we're using fbo's
glReadBuffer(GL_COLOR_ATTACHMENT2_EXT);

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[0]);
glReadPixels(0, 0, 512, 512/2, GL_BGRA, GL_FLOAT, BUFFER_OFFSET(0));

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[1]);
glReadPixels(0, 512/2, 512, 512/2, GL_BGRA, GL_UNSIGNED_BYTE, BUFFER_OFFSET(0));

float *pixels1 = new float[512*512*2];
float *pixels2 = new float[512*512*2];

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[0]);
pixels1 = (float*)glMapBuffer(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY);
findMinMax(pixels1);

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[1]);
pixels2 = (float*)glMapBuffer(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY);
findMinMax(pixels2);

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[0]);
glUnmapBuffer(GL_PIXEL_PACK_BUFFER_ARB);
glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[1]);
glUnmapBuffer(GL_PIXEL_PACK_BUFFER_ARB);

delete[] pixels2;
delete[] pixels1;

-------

As you can see we're rendering to a fbo, is that compatible with this technique?

Share this post


Link to post
Share on other sites
No one has an idea?

We've tried texture readbacks, but that is also soooo slow. Isn't there some efficient way of keeping track of what's beeing written to the framebuffer? It would be really great if ARB_IMAGING would work for rasterized fragments too...

Share this post


Link to post
Share on other sites
The pointer that glMapBuffer returns to you is a pointer directly into video memory.

You do not need to allocate space for it (remove the "new float[512*512*2];") and attempting to free that pointer is also a Very Bad Idea(tm) (remove the deletes).

Once you call glUnmapBuffer, the pointer is simply no longer mapped. There is no other cleanup required.

Share this post


Link to post
Share on other sites
Instead of calling glReadPixels, why not just render into the FBO directly? It seems like you're trying to do an overly large amount of work for something rather simple. :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Bucky32
It runs, but we only seem to access 256 values, and they don't contain any info. The buffer we're trying to read from is 512x512.
One thing I notice that's odd is you are reading back half the FBO as floats and the other half as unsigned bytes.

Another problem is you are creating the two buffers with enough room for only 256 floats.
Quote:
glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[0]);
glBufferData(GL_PIXEL_PACK_BUFFER_ARB, 512 / 2 * sizeof(float), NULL, GL_STREAM_READ);

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, imageBuffers[1]);
glBufferData(GL_PIXEL_PACK_BUFFER_ARB, 512 / 2 * sizeof(float), NULL, GL_STREAM_READ);
You need 512 * (512/2) * 4 floats if you are reading back as BGRA in floats. And unless you are rendering to a floating-point FBO it's probably a better idea to read the data as unsigned bytes (and allocate enough space in unsigned bytes instead of floats when you create the buffers).

Share this post


Link to post
Share on other sites
Thanks Kalidor!

We saw the mistake with float/ubytes, but we missed the size of the buffers =) Now it's working fine. Instead of 24fps, as with a traditional readback, we get 30 with the async method. Is this as fast as it can get? With no readback at all, we have 50fps so the loss is still 40%

Share this post


Link to post
Share on other sites
Quote:
Original post by Bucky32
Thanks Kalidor!

We saw the mistake with float/ubytes, but we missed the size of the buffers =) Now it's working fine. Instead of 24fps, as with a traditional readback, we get 30 with the async method. Is this as fast as it can get? With no readback at all, we have 50fps so the loss is still 40%
Doing it this way you should probably still expect a speed hit (always expect some speed hit for any readbacks). This is because modern video cards can buffer up several frames of GL commands, but when you read back it needs to finish all the queued up commands first, then get all the data from video memory to system memory. The way you're doing it now just lets you start reading in the second half of the buffer while you're doing work on the first half. But when you map the first buffer you may have a few frames worth of commands queued up still so it has to finish all of those before mapping the data. So although you aren't waiting as long as a regular glReadPixels and you can do some work while you're waiting for the second half, you are still waiting. You may be able to split the screen into more buffers for some more speed increase, but there's a point where splitting more will be worse for speed (and that point may appear very quickly).

If you can get away with your min/max results lagging a few frames behind, another (possibly better) approach is to have a few PBOs and at the end of the first frame send a glReadPixels call into the first PBO, at the end of the second glReadPixels into the second PBO, etc. Then after a few frames of this you can map into the first frame's PBO and use that data for the current frame, then glReadPixels into that PBO again. Then continue for the next frame/PBO, etc. Depending on what else you're doing this technique may be better or worse for you. You'll just have to test it out.

Here is a pretty good paper about general pixel transfer optimizations including how to best use PBOs.

Share this post


Link to post
Share on other sites
Wait a minute.

What method does ARB_imaging use to find the max and min of a set of vectors? (Which is essentially what you want to do) This has huge bearings on the quality and speed at which it can operate. If you want to do this yourself, I suggest you look up some vector sorting routines (Partial ordering, Reduced ordering, Conditional ordering, Marginal ordering to name a few). Some are really quite heavy and complex, and have no place in realtime applications, but I'm sure a few can be made realtime (such as a pairwise vector ordering scheme that is being presented in a few weeks at a IEE conference).

Share this post


Link to post
Share on other sites
Quote:
Original post by Kalidor

If you can get away with your min/max results lagging a few frames behind, another (possibly better) approach is to have a few PBOs and at the end of the first frame send a glReadPixels call into the first PBO, at the end of the second glReadPixels into the second PBO, etc. Then after a few frames of this you can map into the first frame's PBO and use that data for the current frame, then glReadPixels into that PBO again. Then continue for the next frame/PBO, etc. Depending on what else you're doing this technique may be better or worse for you. You'll just have to test it out.



That looks like a good idea to try. Using the min/max as we're doing right now produces some unwanted artefacts, so we probably will need some sort of lag anyway.

thanks!

Share this post


Link to post
Share on other sites

This topic is 4400 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this