FBO - feedback loop / NPOT textures

Started by
7 comments, last by jeinor 12 years, 4 months ago
I have two questions about FBOs. First: I'm doing multipass rendering. On my 8600GT it works fine if there's a feedback loop in the shader (I use the same texture for sampling and as renderer target). But the specification of FBO says such loop can result in undefined behaviour. I tried using 2 FBOs and switching between them (ping-pong), but it has severe impact on performance (about 40% less FPS). My question is: is there some extension that allows feedback loops, that I should check before deciding to use a feedback loop, or is it just luck that it works? Second: Is there any way to convince the ATI Radeon 9800 Pro to use NPOT textures as color attachments for the FBO? Everything works fine as long as I use 512x512 or 1024x1024 textures, but with 640x480 etc. I get FRAMEBUFFER_INCOMPLETE_DIMENSIONS_EXT. Thanks in advance for all answers.
Advertisement
Just luck that it works, the GPU processes fragments in parallel, you have no way to ensure that you're reading from the original texture or reading from an already processed texel.

For the second question, the Radeon 9800 doesn't support NPOT textures, so, you simply can't make a NPOT sized FBO. Get a modern card, such as a HD4xxx or a GeForce 8xxx and you'll have no trouble.
Quote:Original post by GL1zdA
Second: Is there any way to convince the ATI Radeon 9800 Pro to use NPOT textures as color attachments for the FBO? Everything works fine as long as I use 512x512 or 1024x1024 textures, but with 640x480 etc. I get FRAMEBUFFER_INCOMPLETE_DIMENSIONS_EXT.


Try to not use mipmaps. set min and max filter to GL_LINEAR or GL_NEAREST.

I'm surprised that performance would be impacted so much just by adding another FBO. Try to make 1 FBO and make your attached texture the same dimensions.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);

Just luck that it works, the GPU processes fragments in parallel, you have no way to ensure that you're reading from the original texture or reading from an already processed texel.


Would it be ok to render to the same texture I'm sampling from if I only sample from exactly the same texel I render to? In that case I know the texel haven't been processed yet, as no other calculations other than the current calculation alters the state of the texel.
With this extension you can, but I think you can't render to the same location
http://www.opengl.org/registry/specs/NV/texture_barrier.txt

This extension relaxes the restrictions on rendering to a currently bound texture and provides a mechanism to avoid read-after-write hazards.[/quote]
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);

With this extension you can, but I think you can't render to the same location
http://www.opengl.or...ure_barrier.txt


Thanks for the tip!

Why do you think you can't render to the same location? As far as I can see, there's actually a requirement you do exactly that: There is only a single read and write of each texel, and the read is in the fragment shader invocation that writes the same texel (e.g. using "texelFetch2D(sampler, ivec2(gl_FragCoord.xy), 0);").

[quote name='V-man' timestamp='1322740638' post='4889378']
With this extension you can, but I think you can't render to the same location
http://www.opengl.or...ure_barrier.txt


Thanks for the tip!

Why do you think you can't render to the same location?[/quote]

You can (with and without that extension). You are allowed to read each texel exactly once before writing to the same one. This always works, also in absence of that extension (although strictly according to the spec, it is undefined).

What does not work without this extension is doing the same thing several times in a row, because there is a chance you read data in the second or third pass that has not been written out from cache yet. This is what the extension provides. Any shader running after glTextureBarrierNV will see anything a shader before that call has written.

What also does not work is reading several texels when writing to the same texture, since you do not know which ones are processed and written to at what time, and there is no synchronisation.

You can (with and without that extension). You are allowed to read each texel exactly once before writing to the same one. This always works, also in absence of that extension (although strictly according to the spec, it is undefined).

What does not work without this extension is doing the same thing several times in a row, because there is a chance you read data in the second or third pass that has not been written out from cache yet. This is what the extension provides. Any shader running after glTextureBarrierNV will see anything a shader before that call has written.

What also does not work is reading several texels when writing to the same texture, since you do not know which ones are processed and written to at what time, and there is no synchronisation.


Thanks for the clarification. I actually started a new topic over at opengl.org for my particular case before this thread came alive again, and got some answers there as well (http://www.opengl.or...t&Number=306941). From how I understood them, I was told it would be impossible to solve my problem without the use of extensions or to re-think.

However, if I can read the texel exactly once in the same fragment shader as you say, I think that would solve my problem. I googled around some more, and found this page: http://www.opengl.or...he_Same_Texture. From that page: "You still don't get to read and write to the same location in a texture at the same time unless there is only a single read and write of each texel, and the read is in the fragment shader invocation that writes the same texel." - that's exactly what I need to do (if I understand them correctly).

How I want it to work (drawing 2D only):
  1. Send down some triangles using glDrawArrays() or glVertex3f() followed by glEnd().
  2. Have the fragment shader execute, and let it check the current value of the texel to write once, then write that texel back once.
  3. Send down some more triangles using glDrawArrays() or glVertex3f(), again followed by glEnd().
  4. Have the fragment shader execute again, possibly reading the same texel and writing it again (but seperated by glEnd()).
  5. (and so on)

When you say exactly once, what do you mean? Would the above steps have written one or more times to the same texel (considering the glEnd() in between)?

When I think of it, what I really want to do is to implement a fully custom glBlendFunc. If the above technique is allowed, I actually could implement the same functionality as glBlendFunc provides. Maybe that makes what I want to achieve more clear.

Thanks :)
I tried using 2 FBOs and switching between them (ping-pong), but it has severe impact on performance (about 40% less FPS).

That's bogus. Don't measure in FPS, measure in milliseconds. If, say, you're running at 2000 FPS and now you're running at 1000 FPS, that's 50% less but actually it's an extremely cheap operation (0,1 ms).
If you're running at 10 FPS and now 5 FPS, it's 50% too; yet it's hell of an expensive operation (200 ms).

You can (with and without that extension). You are allowed to read each texel exactly once before writing to the same one. This always works, also in absence of that extension (although strictly according to the spec, it is undefined).[/quote]
While it may work on every hardware known to date (does it?). I wanted to make more emphasis in that, like you've just said, it's strictly speaking undefined behavior. Which means nothing prevents in the future a specific driver just refusing to draw your GL calls when it detects you're attempting to do a feedback loop.
It's undefined, which means a vendor is allowed to reject the operation or start throwing garbage from there on, even if it could be drawn perfectly fine in it's hardware.

Cheers
Dark Sylinc

While it may work on every hardware known to date (does it?). I wanted to make more emphasis in that, like you've just said, it's strictly speaking undefined behavior. Which means nothing prevents in the future a specific driver just refusing to draw your GL calls when it detects you're attempting to do a feedback loop.
It's undefined, which means a vendor is allowed to reject the operation or start throwing garbage from there on, even if it could be drawn perfectly fine in it's hardware.


Thanks for making that clear. I'm convinced.

I've read some use a ping-pong technique for rendering to texture, meaning they render to one texture and sample from another, and then have them switch places for the next render loop. The reason that works is the GPU finalizes all pixels when switching texture, right? Otherwise the target texture could not have been used as sampling texture in the next render loop (the same problem as with sampling/rendering to the same texture, we can not be certain all pixels have been updated before beginning our new render loop).

This topic is closed to new replies.

Advertisement