Left at about 1:30 yesterday afternoon to go into Bicester (where I grabbed a burger from the van on Sheep Street - best fucking burgers in the country, I swear, swimming in grease and you can feel your heart screaming "ARE YOU TRYIN TA KILL ME?" as it goes down but they're so good with the melted cheese and the onions all fried in, mmmmmmm) and then on to Milton Keynes where I met up with a friend to go to the cinema. We saw Charlie and the Chocolate Factory - my second time, but her first, and there's not much else on anyway. Then crashed back at her place for the night, then headed back here a few hours ago.
We even found time to do a little clothes shopping in MK. We celebrate my birthday next Saturday, and afterwards she and I plan to head down to London where she will take me to Slimelight for my very first time. They have a relatively strict dress code though - "If it's not black, fuck off" - so I'm pulling together a goth disguise. Over the next week I need to try and find a loose, light, black t-shirt, and some black shoes (preferably boots). Should be easy enough to pop into some shops round Oxford, it'll just be a question of finding the right places.
Technically minded folks: OK, stop skipping now.
I'm still pulling together my little screensavery thing, and currently implementing HDR using this article as a guide. However, I think I've found a way to calculate the image key on the GPU, avoiding any readbacks and thus any stalls while the CPU waits for the GPU to finish rendering.
The image key formula in the article is:
image_key = (1/number_of_pixels) * exp( sum_of( log( luminance( each_pixel ) ) ) )
That summation is the hard part. Once we've got the summed value it's pretty easy - a few instructions to exp() and divide it by number of pixels. Here's what I'm going to try:
- Create a 1x1 R16FG16FB16FA16F render target. Ideally this would be R32F but you'll understand why it can't be in a moment...
- Given an MxN HDR image, set up vertex buffers (I might use indexing) to render M*N triangles that cover the whole viewport, with their texture coords set to map to each individual texel of the source image. i.e. a single texel will be used for the entire primitive.
- Enable alpha-blending with both SRC and DEST blending set to ONE. (This is why I have to use R16FG16FB16FA16F - AFAICT it's the only floating-point format to support alpha blending. R32F would be nicer because it uses half as much memory and gives a lot more precision).
- Install a pixel shader to sample the source image using the interpolated (i.e. constant) texcoords, and write out the log luminance.
Each primitive will calculate the log luminance of one texel in the source image and add it to the value stored in the 1x1 render target. When it's finished, the value in the render target should be the required summation. I can then simply set that as a texture, sample it when doing the actual tone mapping (probably in the vertex shader, so that I'm doing 4 reads instead of 1280x1024 reads), perform the exp() and divide on it, and voila, I have an image key ready to be used for the per-pixel HDR->LDR tone mapping. Should all execute asynchronously meaning no blocking. I'll let you know how it turns out, anyway.
I think it will be very slow to do one triangle per pixel. It would be slightly faster to use point sprites, but either way you will be setup bound.
On gf6 cards, that means you will go at half the clock rate, best case. The more attributes you have, the slower. It may still be fast enough for your purpose, however.
So, why are you not using a pixel shader to downsample the scene for exposure? I believe ( haven't tested this ), that the fastest possible way is to do a very long shader program to sum the whole screen. The loops might kill you, so maybe its faster to do it in horizontal strips of unrolled filter code.