DoF in Graphics Gems from CryENGINE 3 (Siggraph 2013)

Started by
5 comments, last by RealtimeSlave 5 years, 2 months ago

Hello,

found a discussion about the DoF implementation

https://de.slideshare.net/TiagoAlexSousa/graphics-gems-from-cryengine-3-siggraph-2013

in Cryengine 3 here

https://www.gamedev.net/forums/topic/675214-dof-near-field-bleeding/

but unfortunately the thread was already closed.

Have some questions and hopefully someone is still around who can answer them :)

1. Slide 36:

I guess this describes how to compute the 49 sample offsets for the 7x7 kernel? I.e. how to compute the final morphed radius and angle for a sample relative to the current center pixel? So this could be done once on the CPU, compute radius and angle for all 49 samples and upload a uniform array with the x,y offsets for each position derived from the angle and radius?

2. Slide 37:

How exactly does the flood-fill-pass work? What are the offsets for taking 3x3 samples or do we just sample the 3x3 pixels around each current center pixel? And do we take the maximum/brightest color of the 9 samples to approx boolean union? If yes, how do we comoute which of the 9 RGB colors is "the brightest"?

3. Slide 41:

Honestly I don't get what this "tile count" stuff is? What tiles are they talking about? Where does the tile count come from? I guess we have some R8G8 renderbuffer which we want to fill with something. Our input renderbuffer contains the fullres COC as A8 or float texture (don't know what format is proposed to store the COC values)? What exactly are we doing when downscaling tilecount-times? What is this downscale shader doing during each tilecount-pass? Does it start with fullres or halfres input which is mentioned in the slides?

4. Slide 42:

How does this custom bilateral filter look like? Is there an implementation available? Does this filter downscale only color or also COC?

Does "weight samples with far COC" mean each sample color is multiplied with the COC of the sample (taken from half res input)?

What does "pre-multiply layer with far COC" mean? Is "layer" the half res color input? Or the full res input before the bilateral downscale? Do we AGAIN multiply colors with COC although we do this already when we "weight samples"?

5. Slide 44:

How does the upscale bilateral filter work? For each fullres pixel we take 4 taps from halfres and do some magic based on the COC values? We do this "somehow" in a bicubic quality manner? And then blend based on far COC? Is far COC betwee 0.0 and 1.0?

 

Ok a lot of questions I know ;( But help is really appreciated!

Advertisement
  1. Yeah, that's the equation you would use to compute the offset that you would use for each sample. You can pre-compute it if you'd like, but if you unroll the loop in your shader then most of the math will get baked down by the optimizer. 
  2. Yeah, I believe they just used max() but it's not super-clear from their slides. I've always just used avg() in the past, which softens the edges but avoids getting a "blocky" look. For max() you would probably want to compute the brightness of the pixel, which you can by computing dot(color, 0.333f) or by dot'ing the color with the RGB luminance weights.
  3. They're basically saying that they break the screen up into tiles, where each tile is NxN pixels in size. So for instance if you were using 16x16 tiles, you would end up computing the min/max CoC for each 16x16 tile in the screen. You can do this by repeated 2x2 downscaling where you compute the min/max for each 2x2 region, or you can also use a compute shader and shared memory (or wave ops if they're available) to compute the min/max in a single pass using a reduction.
  4. In this case I assume that their bilateral filter just weights the pixels by abs(CoCSize). This way you only pull in the "blurry" pixels and leave out the in-focus pixels.
  5. Similar to what I said above: weight by CoC size.

FYI I have a DOF implementation in one of my open source projects that was roughly based on the techniques from this presentation, if you want to have a look.

Sorry for the late reply but I was not able to be here in the last days.

On 2/3/2019 at 11:37 PM, MJP said:

Yeah, I believe they just used max() but it's not super-clear from their slides. I've always just used avg() in the past, which softens the edges but avoids getting a "blocky" look. For max() you would probably want to compute the brightness of the pixel, which you can by computing dot(color, 0.333f) or by dot'ing the color with the RGB luminance weights.

At least in the referenced McIntosh12 paper MAX seems to be used for union. But still don't know how they compute the sample offsets for the 3x3 flood fill pass. Do they just sample all neighbours around each center pixel? Or is there again some sample offset array and a scaling by COC factor involved?

On 2/3/2019 at 11:37 PM, MJP said:

They're basically saying that they break the screen up into tiles, where each tile is NxN pixels in size. So for instance if you were using 16x16 tiles, you would end up computing the min/max CoC for each 16x16 tile in the screen. You can do this by repeated 2x2 downscaling where you compute the min/max for each 2x2 region, or you can also use a compute shader and shared memory (or wave ops if they're available) to compute the min/max in a single pass using a reduction

Are you really sure this is the case? Because the wording they use does not really fit to this algorithm above. They say "Downscale COC target k times (k = tile count)". If we downscale a renderbuffer of arbitrary dimensions "k times", how can we have "k tiles" in the end? Maybe the wording is just wrong.

How is k actually chosen?

Does this also mean that later, when looking up min/max COC from this downscaled buffer, the shader has to map current half res x,y position to the downscaled x,y position?

On 2/3/2019 at 11:37 PM, MJP said:

In this case I assume that their bilateral filter just weights the pixels by abs(CoCSize). This way you only pull in the "blurry" pixels and leave out the in-focus pixels.

Afaik the purpose of a bilateral filter is to preserve edges so why should they give "blurry" pixels more weight when generating the half res input? Honestly it does not make any sense to me. I would guess they really just want a good quality downscaled half res input and they use a bilateral filter that preserves edges to avoid intensity leakage artifacts.

On 2/3/2019 at 11:37 PM, MJP said:

Similar to what I said above: weight by CoC size.

As above, not sure this is correct =/

On 2/3/2019 at 11:37 PM, MJP said:

FYI I have a DOF implementation in one of my open source projects that was roughly based on the techniques from this presentation, if you want to have a look.

Cool man! :)I might take a look once I have more details on the original algorithm. 

5 hours ago, RealtimeSlave said:

Are you really sure this is the case?

@MJP is a graphics god; a lot of people in the game graphics realm refer to his posts/blogs on twitter all the time.  If he says that is how it is, you're pretty safe in assuming that is how it is :)  He's also been a moderator on these forums for quite awhile now which should add more weight to his words.  Check out his blog: https://mynameismjp.wordpress.com

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

10 hours ago, RealtimeSlave said:

At least in the referenced McIntosh12 paper MAX seems to be used for union. But still don't know how they compute the sample offsets for the 3x3 flood fill pass. Do they just sample all neighbours around each center pixel? Or is there again some sample offset array and a scaling by COC factor involved?

I interpreted it as sampling the 8 neighbors around the pixel in a 3x3 pattern. I'm not sure if they were doing anything else that would weight the samples, but the describe the pixels as containing values pre-weighted by CoC size which means that the more in-focus pixels will naturally contribute less.

10 hours ago, RealtimeSlave said:

Are you really sure this is the case? Because the wording they use does not really fit to this algorithm above. They say "Downscale COC target k times (k = tile count)". If we downscale a renderbuffer of arbitrary dimensions "k times", how can we have "k tiles" in the end? Maybe the wording is just wrong.

How is k actually chosen?

Does this also mean that later, when looking up min/max COC from this downscaled buffer, the shader has to map current half res x,y position to the downscaled x,y position?

So when I initially read this, I assumed that they were basically using the same concept that was introduced in Morgan McGuire's motion blur paper: you want to compute the maximum "scattering distance" within a tile, that way each pixel in that tile knows how far out they need to gather to make sure that they sample the scattered pixels. For motion blur the scattering is along a line, while in DOF it's in 2D. But it's the same concept, really. For DOF you'll want to make sure that the tile size is big enough to account for your maximum scattering distance, which is effectively going to be equal to your maximum sample radius in your bokeh-shaped gather kernel. The CryEngine presentation even uses the same exact language on slide 49 when talking about McGuire's motion blur (downscale velocity buffer k times), so it seems to be referring to the same concept. 

10 hours ago, RealtimeSlave said:

Afaik the purpose of a bilateral filter is to preserve edges so why should they give "blurry" pixels more weight when generating the half res input? Honestly it does not make any sense to me. I would guess they really just want a good quality downscaled half res input and they use a bilateral filter that preserves edges to avoid intensity leakage artifacts.

In this case is basically preserving the edges of the in-focus pixels by ensuring that they don't bleed into the out-of-focus background pixels (basically they want to prevent the issue shown in the image on slide 42). If you weight the in-focus pixels with a value of 0 during the downscale, the in-focus pixels won't contribute at all when they're sampled by the bokeh gathering kernel, which prevents the bleeding.

10 hours ago, RealtimeSlave said:

As above, not sure this is correct =/

I don't blame you, this stuff is confusing and complicated and those slides leave a lot of room for interpretation. If you really want to find out what they would meant, perhaps you should try emailing Tiago or messaging him on Twitter. I would imagine his most recent email at id is in one of his more recent presentations from SIGGRAPH. But if you do find out any info, please come back and share it here!

You should also check out this presentation from Activision, which references the CryEngine presentation and suggests several improvements.

5 hours ago, CrazyCdn said:

@MJP is a graphics god; a lot of people in the game graphics realm refer to his posts/blogs on twitter all the time.  If he says that is how it is, you're pretty safe in assuming that is how it is :)  He's also been a moderator on these forums for quite awhile now which should add more weight to his words.  Check out his blog: https://mynameismjp.wordpress.com

While I appreciate the vote of confidence, I'm just a human who's wrong about things all of the time! I was just sharing my interpretation of what was shared in that presentation, and it's very possible that I missed something or missed understood things. So I think the OP is totally warranted in questioning my explanations, and digging deeper to make sure that they understand things correctly. ?

On 2/11/2019 at 2:02 AM, MJP said:

I interpreted it as sampling the 8 neighbors around the pixel in a 3x3 pattern. I'm not sure if they were doing anything else that would weight the samples, but the describe the pixels as containing values pre-weighted by CoC size which means that the more in-focus pixels will naturally contribute less.

Ok in this case there should probably be some maximum COC radius during the first pass to avoid that the bokeh's "dot pattern" (slide 38, left image) drifts too much apart for the 3x3 sampling in the second pass. Have not seen any max radius being mentioned in the slides, though.

On 2/11/2019 at 2:02 AM, MJP said:

So when I initially read this, I assumed that they were basically using the same concept that was introduced in Morgan McGuire's motion blur paper: you want to compute the maximum "scattering distance" within a tile, that way each pixel in that tile knows how far out they need to gather to make sure that they sample the scattered pixels. For motion blur the scattering is along a line, while in DOF it's in 2D. But it's the same concept, really. For DOF you'll want to make sure that the tile size is big enough to account for your maximum scattering distance, which is effectively going to be equal to your maximum sample radius in your bokeh-shaped gather kernel. The CryEngine presentation even uses the same exact language on slide 49 when talking about McGuire's motion blur (downscale velocity buffer k times), so it seems to be referring to the same concept.

Thanks! I will take a look at this paper! So k is basically chosen based on half res dimensions and this max COC radius.

On 2/11/2019 at 2:02 AM, MJP said:

In this case is basically preserving the edges of the in-focus pixels by ensuring that they don't bleed into the out-of-focus background pixels (basically they want to prevent the issue shown in the image on slide 42). If you weight the in-focus pixels with a value of 0 during the downscale, the in-focus pixels won't contribute at all when they're sampled by the bokeh gathering kernel, which prevents the bleeding.

I see. In this case the implementation is straightforward. I was a bit afraid there is another "diploma thesis" behind their "custom bilateral filter" ;-)

On 2/11/2019 at 2:02 AM, MJP said:

I don't blame you, this stuff is confusing and complicated and those slides leave a lot of room for interpretation. If you really want to find out what they would meant, perhaps you should try emailing Tiago or messaging him on Twitter. I would imagine his most recent email at id is in one of his more recent presentations from SIGGRAPH. But if you do find out any info, please come back and share it here!

You should also check out this presentation from Activision, which references the CryEngine presentation and suggests several improvements.

Yeah you're so right, a LOT of room for interpretation. If I find out more I will definitely share it here.

On 2/11/2019 at 2:02 AM, MJP said:

So I think the OP is totally warranted in questioning my explanations, and digging deeper to make sure that they understand things correctly.

Yeah, just want to understand everything as detailed as possible and really appreciate the discussion. I am not here to deny somebody knowledge or skill.

Thanks!

This topic is closed to new replies.

Advertisement