# HDRR on SM2 vs. SM3

This topic is 3731 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Ok, first I think I understand this topic, but I guess I was wrong. I am currently planning an implementation of HDR. In many newer games (Oblivion, Splinter-Cell, etc) HDR is only available on cards which support SM3. According to the Wikipedia page on HDRR, most games only use it on SM3 because SM3 supports FP16 blending. This confuses me. At first, I thought that SM2 hardware didn't support 16-bit floating point frame buffers/texture buffers. However, this is not true! When testing on an X850, A16R16B16G16F ran perfectly, even though the card is only SM2. The only other answer I have been getting is that SM3 allows support for floating point math as well as integer. So then, that brings me to a question: What is FP16 blending? What does the "blending" mean? I just can't seem to figure out what SM3 has that SM2 does not, when it comes to HDR. I feel like I am missing something everyone else is getting, because anyone with an SM2 card knows how HDR doesn't work on most new games. Also, what is the practical difference of I16 vs FP16? I know I16 means 16-bit integer, but how does that change things for the shader author? I feel so stupid asking, because when I write HLSL shaders, the colors are always float values, so why do people say that SM2 cards only support integer operations? By integer, do people mean fixed-precision, because in HLSL, all color values are generally in the [0,1] range, and if they were tied only to integers, then it seems that the only color values that can be expressed would be 0 and 1, because 0.5 is not an integer. I do have some ideas which may be correct. If I am right that integer (in this case) means fixed precision, which would make sense since in HDR really bright values and really dark values need to have relatively the same precision, as they are both displayed to a relative image (HDR after tonemapping). So, I see how integer operations would lead to inaccuracy, but it seems to me that SM2 cards support floating point operations, because they appear to have no trouble with floating point buffers, so what is the problem?

##### Share on other sites
To understand why you need FP16 blending, you need to understand the HDR pipeline and what's required to produce a final tonemapped image.

Throughout the process, several render targets are created to produce various post-processing effects. For example, a luminance buffer, a bright pass, bloom pass, lens flares, stars, etc. Ideally these are all done in FP16 to maintain full precision, and are finally all blended back into the final frame. This is what FP16 blending is - hardware support for additive blending FP16 textures.

Another critical feature is FP16 filtering - which is used in several downpasses to save on render time. For example bloom: it is common practice to downsample the buffer to 1/16th the size before applying a blurring shader, and upscaling it. FP16 filtering allows bilinear scaling of the image so that this can occur.

Neither of these are required to be built into the hardware to get HDR to work. For example, Masaki Kawase's famous Real Time HDR Image-Based Lighting demo (rthdribl for short) used SM2 level shaders to emulate FP16 filtering and blending. It's just that HDR only really came into use with the advent of hardware FP16 filtering and blending (available in SM3 class hardware), probably because it would have been too slow otherwise.

The difference between FP16 and I16 is precision. Both allow for a maximum value of about 65000, but the maximum brightness that I16 can hold is ~255 times less than FP16. I16 works like any other integer format. For example, 8-bit RGB can hold a value of 0-255 in each channel. And I16 can hold a value of 0-65535 in each channel. But, yes, to the shader author it is transparent. A value of 65535 in an I16 texture would be converted by the hardware to a value of 257.0f for use in the shader.

EDIT: To understand this better, remember that when fetching from an 8-bit integer texture, the hardware converts it to a floating point value for you, for use in the pixel shader. For example, a value of 255 is converted to 1.0f. In a similar fashion, a value of 1020 is converted to 4.0f. The hardware automatically does the divide by 255 for you, so you can work with floats in the shader.

##### Share on other sites
Quote:
 Original post by Sc4FreakTo understand why you need FP16 blending, you need to understand the HDR pipeline and what's required to produce a final tonemapped image.Throughout the process, several render targets are created to produce various post-processing effects. For example, a luminance buffer, a bright pass, bloom pass, lens flares, stars, etc. Ideally these are all done in FP16 to maintain full precision, and are finally all blended back into the final frame. This is what FP16 blending is - hardware support for additive blending FP16 textures.Another critical feature is FP16 filtering - which is used in several downpasses to save on render time. For example bloom: it is common practice to downsample the buffer to 1/16th the size before applying a blurring shader, and upscaling it. FP16 filtering allows bilinear scaling of the image so that this can occur.Neither of these are required to be built into the hardware to get HDR to work. For example, Masaki Kawase's famous Real Time HDR Image-Based Lighting demo (rthdribl for short) used SM2 level shaders to emulate FP16 filtering and blending. It's just that HDR only really came into use with the advent of hardware FP16 filtering and blending (available in SM3 class hardware), probably because it would have been too slow otherwise.The difference between FP16 and I16 is precision. Both allow for a maximum value of about 65000, but the maximum brightness that I16 can hold is ~255 times less than FP16. I16 works like any other integer format. For example, 8-bit RGB can hold a value of 0-255 in each channel. And I16 can hold a value of 0-65535 in each channel. But, yes, to the shader author it is transparent. A value of 65535 in an I16 texture would be converted by the hardware to a value of 257.0f for use in the shader.EDIT: To understand this better, remember that when fetching from an 8-bit integer texture, the hardware converts it to a floating point value for you, for use in the pixel shader. For example, a value of 255 is converted to 1.0f. In a similar fashion, a value of 1020 is converted to 4.0f. The hardware automatically does the divide by 255 for you, so you can work with floats in the shader.

Well then, that clears up a lot of things, but there are still a few things I am confused about. So, how exactly is the implementation (for the renderer and engine programmer) different when using I16 HDR vs FP16? I mean, it seems like SM2 hardware all supports 16-bit FP frame buffers (the old A16R16G16B16F), and if I were to impliment it right now, I'd just use a bunch of render-to-texture operations, and read them together in post processing. I believe that this is the alternate operation which you spoke of, but how would I go about doing it the right way? It seems there are some FP16 shortcuts, but most of the demo's I can find (like the HDR pipeline in the DX9 SDK) appear to use the SM2 way, and since HDR is now pretty much old news, it doesn't look like many new tutorials are coming on the topic. So, in terms of D3D API calls, what are the specific FP16 blending operations are the SM3 only ones which are required to achieve true FP16 HDR? Everyone seems to say there are 4 ways to do it (2x MRT, RGBE, rthdribl, and FP16), but I never really understood the ladder, which now appears to be the most common (apparently for good reason).

##### Share on other sites
A few things:

-Shader Model 2.0 requires a minimum of 24-bit floating-point precision for shader calculations. ATI used 24-bit fp in its 9000-series, while Nvidia used 32-bit in the FX series. The ATI X and X1000 series used 32-bit as did the Nvidia 6000 and 7000 series. These shaders only use floating-point when you're doing your lighting calculations, as native integer support is a recent (DX10) addition to the specification. So you don't need to worry about fp-support or precision in the actual shaders, even in SM20.

-There's no "right" way to to HDR. Simply different approaches with different tradeoffs. Many use a FP16 + postprocessing approach because its intuitive, and decouples the actual tone-mapping from the rendering of your objects. However an approach such as the one Valve took where tone-mapping is performed in the shader is perfectly viable, and has the benefits of reduced bandwidth, and the lack of need for fp blending + MSAA. Some things, such as when and how bloom should be applied, will never be agreed upon by everybody! IMO you shouldn't be so worried about what approach is the "correct" way, and should instead focus on what's best for your engine, level of skill, etc.

-When we talk about FP16 blending we're talking about post-pixel-shader alpha-blending. This is the capability of a GPU to take the output of a shader and somehow combine it with the contents of a FP16 buffer based on your blend states. You can test for it using CheckDeviceFormat, like this:

if (SUCCEEDED(d3dObject->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL,D3DFMT_X8R8G8B8,	D3DUSAGE_QUERY_POSTPIXELSHADER_BLENDING,D3DRTYPE_TEXTURE, D3DFMT_A16B16G16R16F)))        fpBlending = TRUE;else	fpBlending = FALSE;

This sort of blending can be required for alpha-blended geometry, multi-pass lighting (included deferred rendering), or just combining post-processing passes. Of course its always possible to get around lack of blending support, it just may require more passes and extra buffers and could possibly complicate your renderer.

##### Share on other sites
Quote:
Original post by generaleskimo
Quote:
 Original post by Sc4FreakTo understand why you need FP16 blending, you need to understand the HDR pipeline and what's required to produce a final tonemapped image.Throughout the process, several render targets are created to produce various post-processing effects. For example, a luminance buffer, a bright pass, bloom pass, lens flares, stars, etc. Ideally these are all done in FP16 to maintain full precision, and are finally all blended back into the final frame. This is what FP16 blending is - hardware support for additive blending FP16 textures.Another critical feature is FP16 filtering - which is used in several downpasses to save on render time. For example bloom: it is common practice to downsample the buffer to 1/16th the size before applying a blurring shader, and upscaling it. FP16 filtering allows bilinear scaling of the image so that this can occur.Neither of these are required to be built into the hardware to get HDR to work. For example, Masaki Kawase's famous Real Time HDR Image-Based Lighting demo (rthdribl for short) used SM2 level shaders to emulate FP16 filtering and blending. It's just that HDR only really came into use with the advent of hardware FP16 filtering and blending (available in SM3 class hardware), probably because it would have been too slow otherwise.The difference between FP16 and I16 is precision. Both allow for a maximum value of about 65000, but the maximum brightness that I16 can hold is ~255 times less than FP16. I16 works like any other integer format. For example, 8-bit RGB can hold a value of 0-255 in each channel. And I16 can hold a value of 0-65535 in each channel. But, yes, to the shader author it is transparent. A value of 65535 in an I16 texture would be converted by the hardware to a value of 257.0f for use in the shader.EDIT: To understand this better, remember that when fetching from an 8-bit integer texture, the hardware converts it to a floating point value for you, for use in the pixel shader. For example, a value of 255 is converted to 1.0f. In a similar fashion, a value of 1020 is converted to 4.0f. The hardware automatically does the divide by 255 for you, so you can work with floats in the shader.

Well then, that clears up a lot of things, but there are still a few things I am confused about. So, how exactly is the implementation (for the renderer and engine programmer) different when using I16 HDR vs FP16? I mean, it seems like SM2 hardware all supports 16-bit FP frame buffers (the old A16R16G16B16F), and if I were to impliment it right now, I'd just use a bunch of render-to-texture operations, and read them together in post processing. I believe that this is the alternate operation which you spoke of, but how would I go about doing it the right way? It seems there are some FP16 shortcuts, but most of the demo's I can find (like the HDR pipeline in the DX9 SDK) appear to use the SM2 way, and since HDR is now pretty much old news, it doesn't look like many new tutorials are coming on the topic. So, in terms of D3D API calls, what are the specific FP16 blending operations are the SM3 only ones which are required to achieve true FP16 HDR? Everyone seems to say there are 4 ways to do it (2x MRT, RGBE, rthdribl, and FP16), but I never really understood the ladder, which now appears to be the most common (apparently for good reason).

It works just like any other blending. Make a call to SetRenderState(), and set your blend states before rendering your quads.