Fast summation of a texture
Members - Reputation: 420
Posted 22 May 2006 - 05:15 AM
Members - Reputation: 251
Posted 22 May 2006 - 07:32 AM
For real-time usage I would choose option 2. The DirectX SDK sample HDRLighting uses this method to compute the average scene luminance as part of a tone mapping operation.
Senior Moderators - Reputation: 1773
Posted 22 May 2006 - 07:37 AM
Members - Reputation: 166
Posted 22 May 2006 - 07:48 AM
I haven't done anything like this, so there may be a "best practice" method. But just off the top of my head.
You could combine the last two. Draw a N 1x1 points (point sprites) in a 1x1 render target. Depending on your hardware you can do a different number of texture reads, but say you have 16 texture reads per pass, you could then split your texture up into 32x32 blocks (because you read at exactly halfway between the pixel positions to leverage hardware filtering), each point would have an index (which block it is), the block sizes and texture width can be constants in the shader. So the bandwidth cost is just 4*N (one index per point). N is the number of blocks you split your texture in.
So in each shader you sample a 32*32 square area of the texture and output the color (divided by 32*32), which gets alpha blended (alpha = 1/N) to the 1x1 render target (if supported, you may need a larger one, but just draw the points over the pixel at (0,0)). Additive blending may not be supported for floats on your hardware, though, so you may not be able to use the blending, you may find a fixed point (like 16 bits) format that you can use (which is actually better precision).
Newer hardware can do more texture reads so you'll be able to do a whole ton of samples for each point (maybe even the entire texture for smaller ones).
Members - Reputation: 420
Posted 22 May 2006 - 08:18 AM
Thanks for the replies guys, good advice there.
@DonnieDarko: You're right :). I've actually written the code for this now for a few reasons: First To give me a baseline timing to test the other method(s) with. Second, to test for correctness with the hardware versions since the values in the texture will probably be quite small and I need to preserve them. And third, as a fallback if I need this to run on older machines not gifted with suitably hefty hardware.
@Sneftel: I've never really looked into hardware generated mipmaps, although i've been aware of them for quite a while. I'm going to open up the dx docs and have a look on google after I finish typing this. :)
@sebastiansylvan: I really love this idea - I was wondering what I could do to leverage more pixel units, rather than have the vertex units do everything. I think I'll have a crack at implementing this just because it seems very *neat* :D.
Members - Reputation: 1294
Posted 22 May 2006 - 08:57 AM
Original post by Sneftel
A quick and easy way is to have the GPU generate a mipmap chain, then just multiply the value of the single pixel in the last mipmap level by the size of the original texture. Precision is limited to the texture's datatype and the GPU needs to be willing to generate mipmaps for the given format.
Last time I tried using auto mipmap generation is was somewhat slow (or rather, somewhat slower than I'd expect). This was a couple of years ago though, so it might be more practical now.
The other problem is that the quality of auto mipmaps usually sucks, but that doesn't matter here of course. [grin]
Members - Reputation: 214
Posted 22 May 2006 - 11:39 AM
Of course if there is no real time requirement then theres no huge advantage to using this technique (except maybe you don't have to worry about converting file formats).