Mmmmmmmmmmmm

Published August 21, 2005
Advertisement
Technically minded folks: You can skip the next two paragraphs on the shenanigans of my personal life if you want. Except the bit about the burger, because you need to know that.

Left at about 1:30 yesterday afternoon to go into Bicester (where I grabbed a burger from the van on Sheep Street - best fucking burgers in the country, I swear, swimming in grease and you can feel your heart screaming "ARE YOU TRYIN TA KILL ME?" as it goes down but they're so good with the melted cheese and the onions all fried in, mmmmmmm) and then on to Milton Keynes where I met up with a friend to go to the cinema. We saw Charlie and the Chocolate Factory - my second time, but her first, and there's not much else on anyway. Then crashed back at her place for the night, then headed back here a few hours ago.

We even found time to do a little clothes shopping in MK. We celebrate my birthday next Saturday, and afterwards she and I plan to head down to London where she will take me to Slimelight for my very first time. They have a relatively strict dress code though - "If it's not black, fuck off" - so I'm pulling together a goth disguise. Over the next week I need to try and find a loose, light, black t-shirt, and some black shoes (preferably boots). Should be easy enough to pop into some shops round Oxford, it'll just be a question of finding the right places.

Technically minded folks: OK, stop skipping now.

I'm still pulling together my little screensavery thing, and currently implementing HDR using this article as a guide. However, I think I've found a way to calculate the image key on the GPU, avoiding any readbacks and thus any stalls while the CPU waits for the GPU to finish rendering.

The image key formula in the article is:

image_key = (1/number_of_pixels) * exp( sum_of( log( luminance( each_pixel ) ) ) )


That summation is the hard part. Once we've got the summed value it's pretty easy - a few instructions to exp() and divide it by number of pixels. Here's what I'm going to try:


  1. Create a 1x1 R16FG16FB16FA16F render target. Ideally this would be R32F but you'll understand why it can't be in a moment...
  2. Given an MxN HDR image, set up vertex buffers (I might use indexing) to render M*N triangles that cover the whole viewport, with their texture coords set to map to each individual texel of the source image. i.e. a single texel will be used for the entire primitive.
  3. Enable alpha-blending with both SRC and DEST blending set to ONE. (This is why I have to use R16FG16FB16FA16F - AFAICT it's the only floating-point format to support alpha blending. R32F would be nicer because it uses half as much memory and gives a lot more precision).
  4. Install a pixel shader to sample the source image using the interpolated (i.e. constant) texcoords, and write out the log luminance.


Each primitive will calculate the log luminance of one texel in the source image and add it to the value stored in the 1x1 render target. When it's finished, the value in the render target should be the required summation. I can then simply set that as a texture, sample it when doing the actual tone mapping (probably in the vertex shader, so that I'm doing 4 reads instead of 1280x1024 reads), perform the exp() and divide on it, and voila, I have an image key ready to be used for the per-pixel HDR->LDR tone mapping. Should all execute asynchronously meaning no blocking. I'll let you know how it turns out, anyway.
Previous Entry Oh
Next Entry Pretty
0 likes 9 comments

Comments

SimmerD
Hey, superpig.

I think it will be very slow to do one triangle per pixel. It would be slightly faster to use point sprites, but either way you will be setup bound.

On gf6 cards, that means you will go at half the clock rate, best case. The more attributes you have, the slower. It may still be fast enough for your purpose, however.

So, why are you not using a pixel shader to downsample the scene for exposure? I believe ( haven't tested this ), that the fastest possible way is to do a very long shader program to sum the whole screen. The loops might kill you, so maybe its faster to do it in horizontal strips of unrolled filter code.
August 21, 2005 11:17 AM
jollyjeffers
Quote:implementing HDR using this article as a guide.

That's a good article. My bad for not checking the front-page articles often enough [headshake]

Can't find the link right now, but there was another HDRI article on GDNet that I found quite useful in my travels.

Jack
August 21, 2005 11:39 AM
Toxic Hippo
So much technical stuff...

ARE YOU TRYIN TA KILL ME?
August 21, 2005 11:53 AM
superpig
Hmm... I take your point about setup costs, SimmerD, but wouldn't that force the whole thing to use only one pixel processing unit instead of all of them?

I guess the optimum solution might be to combine the two, so each primitive sums a small block of texels and writes the result to the render target. The values would be less susceptable to losing precision due to the 16bit render target format, and you'd be able to have the single pixel rasterized from each primitive use a seperate pixel processing unit.

Or can the shader units only concurrently process pixels from a single primitive? In that case writing to a sqrt(16) = 4x4 render target and having each rasterized pixel loop through a chunk of the source image (16 pixel pipelines?). Then running another shader at the end to sum the 16 values into one.
August 21, 2005 12:34 PM
Ysaneya
Cool trick, i wonder if it works. What i don't get is, why the need for MxN vertices at all ? Why can't you simply use a single quad and let the pixel shader interpolate the texture coordinates of the input HDRI texture ?

No ATI card supports floating-point alpha blending as far as i know, so your trick would only work on NVidia 6800s+
August 21, 2005 12:42 PM
superpig
Quote:Cool trick, i wonder if it works. What i don't get is, why the need for MxN vertices at all ? Why can't you simply use a single quad and let the pixel shader interpolate the texture coordinates of the input HDRI texture ?
The summation trick relies on the same pixel being written to repeatedly, which is why I have to use a whole load of seperate primitives. If I set up a quad to cover the whole image, the rasterizer would churn it up into a single pixel with a single pair of texture coordinates, so I'd only get a single sample.

And yeah, I'll admit that one of my main goals with this is to give my GF7800GTX something to chew on. [grin]
August 21, 2005 01:23 PM
Ysaneya
Ah, i get it now.. indeed you will need a lot of primitives :( Another solution would be to perform P passes (with 2^P being the next power of M or N) and to emulate what automatic mipmaps generation is doing, but with a pixel shader to sum up the pixel values.
August 21, 2005 02:31 PM
superpig
Aye, which would roll out as being another form of the same solution: sum X samples per rendered pixel, where the total number of primitives is Y and thus the total number of processed samples is X*Y (which is the same as the number of pixels in the source image). The trick is, I think, to pick X and Y such that you minimize your setup costs and maximize the extent to which you parallelize (make use of the seperate pixel pipelines). Is it just me, or does this sound like a simple calculus problem?

Setup/shutdown cost S
Texel fetch + loop cost T
Total cost Z
Total number of pixels in source image C

Z = S * Y + T * X * Y
and X * Y = C

.: X = C / Y
.: Z = S * Y + T * Y * (C/Y)
.: Z = S * Y + T * C
.: dZ/dY = S
.: fuck this is wrong and there is no minimum cost O_O
August 21, 2005 06:14 PM
SimmerD
At first blush, the quickest way to digest the screen would be a single 6x4 pixel render target, each digesting a 1/24 area of the screen.

That way all pixel units are used.

Then, you can use 24 runs of a single pixel shader to sum up that result.
August 21, 2005 10:15 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement