Alpha Blend for monocolor (or min/max depth buffer)

Started by
11 comments, last by kalle_h 7 years, 7 months ago

Hey Guys,

Currently I am working on a small project which need to have a extended 'Depth Buffer' which not only store the closest depth but also the furthest depth (as far as I know no such thing exist in DirectX) so I turned to Alpha Blend, I store pixel depth in both red channel and alpha channel, and configure blenddesc to keep max color and min alpha when blending happens. With depth test off, it works like a charm.

However, I didn't use the other three channels at all, so I was wondering can we enable alpha blend for format like DXGI_FORMAT_R16G16_FLOAT? which essentially have only one color channel and one alpha channel?

Or there is a much better way (efficient way) to get pixel's min/max depth?

Thanks

Peng

Advertisement

You can use multiple-render-targets:

- Bind 2 DXGI_FORMAT_R16_FLOAT render-targets.

- Bind a blend desc with (IndependentBlendEnable == true), where for one RT you use MAX and for the other RT you use MIN blend operator.

- In your shader output the same value to both render-targets.

Populating the min-max buffers is probably more efficient, since it consumes less bandwidth. It has the drawback that when you read from the buffers you need to do 2 sample operations, but that as well will consume less bandwidth than RGBA texture.

You can use multiple-render-targets:

- Bind 2 DXGI_FORMAT_R16_FLOAT render-targets.

- Bind a blend desc with (IndependentBlendEnable == true), where for one RT you use MAX and for the other RT you use MIN blend operator.

- In your shader output the same value to both render-targets.

Populating the min-max buffers is probably more efficient, since it consumes less bandwidth. It has the drawback that when you read from the buffers you need to do 2 sample operations, but that as well will consume less bandwidth than RGBA texture.

Thanks N.I.B, that's a good idea using MRTs, but I feel like the overhead of render to separate RTs (write to 1 four channel pixel may be faster than write to 2 seperate one channel pixel? anybody) and alpha blend separate framebuffer may run slower than my original method (though I have to benchmark it...)

But this definitely helps, and could you explain the populating min-max buffers a little bit? How does that works? Tons of thanks~

Thanks N.I.B, that's a good idea using MRTs, but I feel like the overhead of render to separate RTs (write to 1 four channel pixel may be faster than write to 2 seperate one channel pixel? anybody) and alpha blend separate framebuffer may run slower than my original method (though I have to benchmark it...)

Well, you should benchmark it. But it terms of bandwidth, writing 4 16-bit values is double the bandwidth of writing 2 16-bit values. Same goes for fetching the data, you'll read 2 less floats (though the compiler might realize that you only use the red and alpha channel and optimize it).

Another option to reduce the bandwidth is to use blend-state write-mask and mask out the green and blue channel. That might reduce the bandwidth.

That's only helpful if you are bandwidth limited. If you are compute limited, than that probably doesn't worth the trouble.

But this definitely helps, and could you explain the populating min-max buffers a little bit? How does that works?

If you are referring to how the configure the pipeline, when you create the blend state you can set different blend operators for each render-target. MSDN has more info.

If you are referring to how the configure the pipeline, when you create the blend state you can set different blend operators for each render-target. MSDN has more info.

Oh, sorry I thought "Populating the min-max buffers" you mentioned is totally another different method, so just want to ask for detail :-)

But thanks, I will benchmark it and get back to this thread later with datas

Thanks N.I.B, that's a good idea using MRTs, but I feel like the overhead of render to separate RTs (write to 1 four channel pixel may be faster than write to 2 seperate one channel pixel? anybody) and alpha blend separate framebuffer may run slower than my original method (though I have to benchmark it...)

Vastly depends on the HW.

On some HW the total cost is just the sum:
cost( MRT0 ) + cost( MRT1 ) + ... + cost( MRTN ) = total_cost

On other HW the total cost the sum of the costliest operation:
max_cost = max( cost(MRT0), cost(MRT1), ..., cost(MRTN) )
total_cost = max_cost x N

Source: Deferred Shading Optimizations 2011

On GCN export cost is a bit different. See GCN Perf Tweet #6

Thanks N.I.B, that's a good idea using MRTs, but I feel like the overhead of render to separate RTs (write to 1 four channel pixel may be faster than write to 2 seperate one channel pixel? anybody) and alpha blend separate framebuffer may run slower than my original method (though I have to benchmark it...)

Vastly depends on the HW.

On some HW the total cost is just the sum:
cost( MRT0 ) + cost( MRT1 ) + ... + cost( MRTN ) = total_cost

On other HW the total cost the sum of the costliest operation:
max_cost = max( cost(MRT0), cost(MRT1), ..., cost(MRTN) )
total_cost = max_cost x N

Source: Deferred Shading Optimizations 2011

On GCN export cost is a bit different. See GCN Perf Tweet #6

Thanks Matias. One more question: I feel like the request for knowing min/max depth for each pixel is very common especially when we need to do efficient raycasting for volumes (so we don't have to march all the way through the bounding box for sparse volumes). What's the standard way industry used for sparse volume rendering?

Thanks

Thanks Matias. One more question: I feel like the request for knowing min/max depth for each pixel is very common (...)

Actually it's not common. It is common though to compute the min / max of a block of pixels (e.g. an 8x8 block); which can be done as a postprocess once you're done rendering opaque objects.
On consoles you could have that information from some internal working of the Z buffer (Either it was Hi-Z or Z Compression), but it's not available on standard desktop APIs. AFAIK not even in D3D12 & Vulkan.

...especially when we need to do efficient raycasting for volumes (so we don't have to march all the way through the bounding box for sparse volumes). What's the standard way industry used for sparse volume rendering?

There isn't one. This is a heavily researched topic now that we have the horsepower to do it at acceptable framerates; and so far the techniques vary depending on what you want to render and how the researcher approached it (i.e. clouds, volumetric particle FXs, godrays, Global Illumination, AO)

You should test to render twice with less and greater depth testing.

However, I didn't use the other three channels at all, so I was wondering can we enable alpha blend for format like DXGI_FORMAT_R16G16_FLOAT? which essentially have only one color channel and one alpha channel?

Or there is a much better way (efficient way) to get pixel's min/max depth?

As well as the MRT suggestion, you could output r=z and g=1-z, and then just use min blending (and remember later that G contains 1-z, not z :wink:).

This topic is closed to new replies.

Advertisement