I realized that my implementation of the bloom effect was wrong. It seems like in a lot of litterature on the net, white papers or even gpu presentations, one has to run a bright pass filter on the raw fp16 buffer, blur it a few times, then add it back to the scene, and then apply tone mapping.
And then i was wondering why in some of my scenes, i had very bright areas that didn't bloom.
Well, that's logic: the bloom effect should come after tone mapping. The more i think to it, the more it makes sense: by tone mapping, you're basically remapping the brightness range from HDR to [0-1] so that it's displayable on your monitor. The areas on your screen that bloom, are the areas that end up with values higher than 1 after tone mapping; not before.
An example: let's say that i have a very dark scene, most values being in the [0-0.1] range, but i have a small area that has a 0.2 color. After tone mapping, this area will end up being very bright and should bloom. If blooming is applied before tone mapping, the bright pass filter will discard a value as small as 0.2, and there will be no bloom..
So, i fixed the problem simply by adjusting the pixel shaders and reorganizing the code a bit.
I have integrated the "prototype" code as an HDRI render pipe into the engine.
ATI cards do not support the automatic shadow mapping depth comparison test in a pixel shader, you have to implement it yourself with a CMP/SGE instruction. On the other hand, NVidia cards do support this shadow map test, and even have hardware PCF. I found myself having to comment/uncomment lines in my pixel shaders every time i was switching between a system having an ATI card, and another one having an Nvidia card ( and i do that often, at least twice a day ).
Of course, i could duplicate the shader ( ie. have one version for ATI and another one for Nvidia ) but.. that's overkill only for 2 lines of difference.
Instead, i choose to implement a shader preprocessor. It's similar to the compiler preprocessor you have in C/C++: i have added #ifdef / #ifndef / #else / #endif instructions, that are independant of the type of vertex or pixel shader to compile ( ie. it's done by the engine while loading the source code file ). The conditional variables can be set in the engine config, for example when initializing OpenGL, i detect if the vendor string contains "ATI" or "NVidia", and this later allows me to write:
... code for ATI ...
... code for anything else ...
... in my shaders. Simple and very useful.
A rant on OpenGL:
Yes, that time of the year, my yearly rant on OpenGL.
I'm having enough of OpenGL, and will switch to DirectX9/10 progressively.
OpenGL is nice and all for portability or for its "clean" core API when you're a beginner. But having more than 300 extensions is a nightmare, especially when those extensions don't interact very well with each other.
The ARB ( making specifications/extensions for OpenGL ) ( and i particularly blame ATI and Nvidia ) has a philosophy of "let's make an extension that is the minimum common thing supported on our video cards" and "anything else, we'll bother later by adding new extensions".
What does this lead to ? An FBO extension that doesn't even support anti-aliasing, ie. that is useless for your average game developer that is not ready to sacrifice AA for a few render-to-texture effects.
The hardware supports it. DirectX9 supports it. Hell, DirectX9 has been released in 2002. Four years ago. Oh, yes, a GL_EXT_framebuffer_multisample extension has recently been released, but since nobody is supporting it yet.. see what i mean ? Instead of releasing an FBO extension that contains everything, and then if the hardware doesn't support it you get an error, you've got an extension for FBO, then an extension for FBO + antialiasing, and then an extension for FBO + antialiasing + FP16 buffers. What's next ?
In the mean time, the old PBuffers extension still works, but unfortunately, ATI X1800 cards do not report any antialiased pixel format with fp16 while their hardware and DX9 can do it.
In my quest for optimizing performance ( and hopefully getting anti-aliasing to work with HDR ), i experimented the RGBE pixel format. The idea is to use a fixed point, standard 32-bits RGBA buffer instead of a 64-bits fp16 one, and to encode the color and brightness in high dynamic range into the RGBA channels. It takes 3 asm instructions to convert RGBE to RGB inside a pixel shader, but the encoding is a bit more complex ( currently takes 11 asm instructions ). One problem is the use of the "ceil()" instruction which is not available in asm, so i had to emulate it with the "floor()" instruction instead. I wonder if in GLSL this instruction is native or uses a similar trick.
I'm of course loosing performance due to encoding/decoding in every stage of the HDRI pipeline, but on the other hand i'm saving 50% bandwidth and video memory, and i can (potentially) get antialiasing since i'm rendering to a standard fixed point texture.
Zedzeek mentioned the Logluv format and i had a look at it, but unless i'm missing something, the encoding/decoding requires a lot more instructions than RGBE, so its interest looks pretty limited to me.