Fast summation of a texture

Started by
6 comments, last by griffin2000 17 years, 11 months ago
Hi, Here's what I want to do: Take a texture, add each of it's texels and then divide by the number of texels in the texture. I'd like to do this pretty fast. Here are my initial thoughts on the different ways do this: 1) Go the CPU route and lock/read the texture and do it manually. 2) Write a simple shader to downsample the texture repeatedly adding the elements as it goes, until texture size is 1x1 and grab the value. This would need to use a 16bit RGBA float format and I'd need to be careful I'm sampling correctly. 3) (And this might be slightly silly, i'm not sure it's even possible - just thought of it now :)) For each texel in the original texture, make a point (or point sprite..) that references that one texel for its color. Take a 1x1 R32F render target and render all the points into it, with additive blending. This isn't for a real-time process, but i'd like to make it a bit speedy. Any better ideas or recommendations would be greatly appreciated :). Cheers, T
Advertisement
Since it isn't real-time I would choose option 1 since it's the fastest to implement.

For real-time usage I would choose option 2. The DirectX SDK sample HDRLighting uses this method to compute the average scene luminance as part of a tone mapping operation.
A quick and easy way is to have the GPU generate a mipmap chain, then just multiply the value of the single pixel in the last mipmap level by the size of the original texture. Precision is limited to the texture's datatype and the GPU needs to be willing to generate mipmaps for the given format.

I haven't done anything like this, so there may be a "best practice" method. But just off the top of my head.

You could combine the last two. Draw a N 1x1 points (point sprites) in a 1x1 render target. Depending on your hardware you can do a different number of texture reads, but say you have 16 texture reads per pass, you could then split your texture up into 32x32 blocks (because you read at exactly halfway between the pixel positions to leverage hardware filtering), each point would have an index (which block it is), the block sizes and texture width can be constants in the shader. So the bandwidth cost is just 4*N (one index per point). N is the number of blocks you split your texture in.

So in each shader you sample a 32*32 square area of the texture and output the color (divided by 32*32), which gets alpha blended (alpha = 1/N) to the 1x1 render target (if supported, you may need a larger one, but just draw the points over the pixel at (0,0)). Additive blending may not be supported for floats on your hardware, though, so you may not be able to use the blending, you may find a fixed point (like 16 bits) format that you can use (which is actually better precision).

Newer hardware can do more texture reads so you'll be able to do a whole ton of samples for each point (maybe even the entire texture for smaller ones).
Hello!

Thanks for the replies guys, good advice there.

@DonnieDarko: You're right :). I've actually written the code for this now for a few reasons: First To give me a baseline timing to test the other method(s) with. Second, to test for correctness with the hardware versions since the values in the texture will probably be quite small and I need to preserve them. And third, as a fallback if I need this to run on older machines not gifted with suitably hefty hardware.

@Sneftel: I've never really looked into hardware generated mipmaps, although i've been aware of them for quite a while. I'm going to open up the dx docs and have a look on google after I finish typing this. :)

@sebastiansylvan: I really love this idea - I was wondering what I could do to leverage more pixel units, rather than have the vertex units do everything. I think I'll have a crack at implementing this just because it seems very *neat* :D.

Cheers muchly,
T
Quote:Original post by Sneftel
A quick and easy way is to have the GPU generate a mipmap chain, then just multiply the value of the single pixel in the last mipmap level by the size of the original texture. Precision is limited to the texture's datatype and the GPU needs to be willing to generate mipmaps for the given format.


Last time I tried using auto mipmap generation is was somewhat slow (or rather, somewhat slower than I'd expect). This was a couple of years ago though, so it might be more practical now.

The other problem is that the quality of auto mipmaps usually sucks, but that doesn't matter here of course. [grin]
I've not done it myself but I've seen method 2 used alot in demos, etc. as means to do this in real time. Its commonly used for HDR techniques to calculate the range to used in the currently rendering scene (so your final pixel value is fed back in the your tone-mapping code when you render your scene).

Of course if there is no real time requirement then theres no huge advantage to using this technique (except maybe you don't have to worry about converting file formats).
FYI if you have DirectX9 installed the HDRLighting sample uses method 2. Look in the function MeasureLuminance in C:\Program Files\Microsoft DirectX SDK (April 2006)\Samples\C++\Direct3D\HDRLighting\HDRLighting.cpp

This topic is closed to new replies.

Advertisement