low sample count screen space reflections

Started by
17 comments, last by RPTD 10 years, 8 months ago

The basic idea behind SSR is clear to me. Although there is little useful information around the basic idea is to ismply march along a ray in either view or screen space. Personally I do it in screen space as I think this is better but that I can't say for sure for the lack of information around.

Whatever the case the common approach seems to be to do a linear stepping along the ray and then doing a bisecting search to refine the result. The bisecting search is clear and depending on the step size is around 5-6 steps for a large screen and a ray running across a large part of the screen. The problematic part is the step size.

I made tests with a step size of 20 (not counting the refinement). In this case for a large screen (1680x1050 as an example) this gives for a moderately long ray bouncing from one side of the screen to the other of lets say 1000 pixel length a step size of 1000/20 = 50 pixels. This is quite large and steps right scross thiner geometry like for example the edges of the boxes in the test-bed I put together attached below (smaller than 1680x1050 as it's from the editor). Furthermore it leads to incorrect sampling as seen on the right side.

[attachment=17069:test1b.jpg]

Now I've seen other people claiming they do (on the same large screen or larger) 16 samples only even for long rays running across the screen. 16 Samples is even less than the 20 I used in the test which already misses geometry a large deal. Nobody ever stated though how these under-sampling issues work out with such a low sampling count. In my tests I required 80-100 samples to keep these undersampling issues somewhat at bay (speed is gruesome).

So the question is:

1) how can 16 samples for the linear search possibly work without these undersampling issues?

Another issue is stuff like a chair or table resting on the ground. All rays passing underneath would work with an exhaustive search across the entire ray. With the linear test though the test goes into the bisecting phase at the first step the ray crosses geometry like the table or chair. The bisecting test then finds no solution and thus leaks the env-map through. Some others seem to not be affected by this problem but what happens there? Do they continue steping along the ray if the bisecting fails? This though would increase the sample count beyond 20+6 and kills the worst case. So another question is:

2) with rays passing underneath geometry at the first linear search hit and bisecting fails to return a result, what do you do? continue on the ray with worse worst case sample count or fail out?

3) how to detect these cases properly to fade out? bluring or more intelligent?

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

Advertisement

Usually you use sub resolution buffer for raymarching this not only reduce amount of rays but also shorten the max distance. Another trick is to use randomized jitter pattern and blur resulted reflection buffer slightly(more vertical blurthan horizontal). Then use bilateral upsampling when you combine reflections to main scene. You can also skip too far away pixels, all rough materials, screen edges(not enough data) and pixels with normal that point towards camera. Cry engine 3 use half precision depth for all other samples but center sample. With all these ray count is reduced quite radically so you can use little bit more samples if needed.

Usually you use sub resolution buffer for raymarching this not only reduce amount of rays but also shorten the max distance. Another trick is to use randomized jitter pattern and blur resulted reflection buffer slightly(more vertical blurthan horizontal). Then use bilateral upsampling when you combine reflections to main scene. You can also skip too far away pixels, all rough materials, screen edges(not enough data) and pixels with normal that point towards camera. Cry engine 3 use half precision depth for all other samples but center sample. With all these ray count is reduced quite radically so you can use little bit more samples if needed.

I thought about the smaller resolution buffer too but there is something that doesn't connect for me with that idea and that's the blur. Let's say I go 2 levels lower than this is a factor of 2 in both directions. Thus 1 pixel covers 16 pixels (4x4). So let's say I calculate a result for this 16 pixel block. Do then all pixels obtain this hit point texture coordinate and jitter around it? This would be like applying a 5x5 blur on the input image.

Furthermore with subtle rough materials like a subtle rippled leather normals are slightly different for all pixels in the 16 pixel group. Yet with this down-sampling approach all pixels end up on the same texture coordinate.

Last but not least using the sampel case above what for the edge of the boxes? if one group of 16 pixels scores a hit just on the edge but the next 16 pixels group sitting right next to it does not would this not introduce a 4x4 pixel wide stair-step artifact?

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

Simple answer. 2 levels lower might be too low res. Start from half resolution and see if that works.

One level lower is a factor of two as you said. With 20 steps a ray though the step size is 50 pixels as mentioned in my first post. This gives log2(50)~=5.6 which is roughly 5 levels. I just experimented with some smaller screen but the results are full of holes since blocks of 4 (2x2) pixels scores incorrect hits entering the bisection on incorrect situations. That's not a possible solution to get down to 16 samples.

Somebody once claimed the CryEngine uses something with 20x20 pixel blocks? I have a hard time to believe this as this would be 400 pixels mapped to one single test-ray. The results would be brutally crappy.

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

Your best option is to use a min-max hierarchy to accelerate your ray march.

The idea is to store a mip-chain that contains the minimum and maximum depths of the current surrounding pixels. Then you just check your ray to see if it's inside a tile that has an intersection (compare the min/max depths) and if it's not, move to the boundary of the tile (box-intersection) and do it again. If it's inside you go down a level and test it again. You do this until you've reached a maximum number of samples or until you reach the lowest level.

In practice this hugely reduces the ray marching cost because you end up skipping all of the empty spaces that you'd normally be marching along, so you don't need as many steps to get good results. smile.png

hey there,

what I do is 8+32 samples. 8 for getting closer to the surface in big steps (and terminate earlier if reached), and 32 in a binary search algorithm. This gives me perfect results without all the jittering blurring downsampling. Although if you'd like to have rough (blurred) reflections, then you'll need to do downsampling. I found that 16 samples is good enough in terms of performance, but in some cases Moire patterns appear.

Best regards,

Yours3lf

Your best option is to use a min-max hierarchy to accelerate your ray march.

The idea is to store a mip-chain that contains the minimum and maximum depths of the current surrounding pixels. Then you just check your ray to see if it's inside a tile that has an intersection (compare the min/max depths) and if it's not, move to the boundary of the tile (box-intersection) and do it again. If it's inside you go down a level and test it again. You do this until you've reached a maximum number of samples or until you reach the lowest level.

In practice this hugely reduces the ray marching cost because you end up skipping all of the empty spaces that you'd normally be marching along, so you don't need as many steps to get good results. smile.png

I experimented already with that beforehand but it fails for various reasons.

For one the min-max criteria is incorrect. If you have for example a box parallel to the screen all pixels in the mip-mapped level have the same depth value. The minimum and maximum is thus the same. Box-testing against this pixel scores no hit since the ray-test considers it a thin piece of paper instead of a thick box. The same happens with walls. So the min-max is of no use since wrong information are stored. The results are broad and long stripes of black across the entire image (not shown here but you can imagine).

EDIT: To be more presice the test fails because rays can pass behind geometry and not score a hit. A box would be typically fully missed since the ray is tested against a thin box and thus only scores an intersection if the ray runs through the thin box in the very same pixel. If it enters and leaves the pixel though slightly behind the thin box no hit is scored and the result fails.

Another reason is the initial-sampling problem. The starting point (or rather one next to it) is always included in the first mip-map level you sample. Let's say this is level 5. Due to this for the first sample you always run down to level 0 for testing just to find no hit (since the first testing pixel is in every mip-map level above and thus min-max will always score a hit no matter what). You loose thus 6 samples of your total 16 just for this inevitable first case.

Another problem is are the higher level groups pixels around the ray not hit by the ray. If a pixel in this group is closer than the z-value of the ray the group is tested due to min-max criteria resulting in an early hit doing bisection on a group of pixels not possibly scoring a hit. Results in large black stripes again.

Last but not least the recursive nature of this approach (even if programmed iterative) is a huge slow-down. 30 samples with an iterative implementation of min-max on my Radeon HD 4870 is as slow as doing 100 samples with the current flawed approach while producing crappy results.

EDIT: I did some more testing. Maybe it's just the Radeon 4870. I noticed that if I do a "loop1 then loop2" where loop2 is run only once then I get good speed. If I do though "loop2 inside loop1" then the speed drops down to 50% or less. Chances are the loops mixed with the if-else branches required to do more clever stepping solution is too much for the Radeon 4870 to muster up. That doesn't change though the mathematical problems with this approach but would explain the slow speed.

There is though maybe a possibility to modify the problem a bit that might work. I have though to experiment with that.

hey there,

what I do is 8+32 samples. 8 for getting closer to the surface in big steps (and terminate earlier if reached), and 32 in a binary search algorithm. This gives me perfect results without all the jittering blurring downsampling. Although if you'd like to have rough (blurred) reflections, then you'll need to do downsampling. I found that 16 samples is good enough in terms of performance, but in some cases Moire patterns appear.

Best regards,

Yours3lf

8 samples is gruesome on 1680x1050 screen this is a block size of 1680/8 = 210 pixels. Steping over 210 pixel should miss even large scale objects (it's 12.5% of the screen size). What kind of scene do you use? Also why 32 samples for the bisection test? As soon as you get to a step size below 1 pixel there is no gain anymore to quality. For 210 pixels this is floor(log2(210))+1 = 8 samples. 32-8 = 24 samples wasted. Mind elaborate the reason for the 32 samples? Or are you using after the first hit a starting step size that is larger than 210/2 = 105 pixels?

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine


8 samples is gruesome on 1680x1050 screen this is a block size of 1680/8 = 210 pixels. Steping over 210 pixel should miss even large scale objects (it's 12.5% of the screen size). What kind of scene do you use? Also why 32 samples for the bisection test? As soon as you get to a step size below 1 pixel there is no gain anymore to quality. For 210 pixels this is floor(log2(210))+1 = 8 samples. 32-8 = 24 samples wasted. Mind elaborate the reason for the 32 samples? Or are you using after the first hit a starting step size that is larger than 210/2 = 105 pixels?


I'm not doing this in screen space, because as you wrote it, things get rather bad at high resolutions. Instead I move around in view space, where I have the nearz at 1.0 and farz at 1000. My step size for the initial pass is 1/4th of the view space z length. I do the initial pass until I go 'beyond' the view space z at the given pixel. (So this pass may terminate before 8 steps are done) Then I do the binary search halving the step size at each step. (32 or 16 steps, fixed)

The scene is the sponza scene from crytek.

I have no idea why 32. It is an empirical value tongue.png

That is a good idea I could check for the step size, I'm only worried about the non-uniform flow control that's going on, and I'm not sure yet which would hurt more.

As for the 'when to fade out', ie. when we reach the boundaries of screen space, I do a lot of checks. I check if the resulting screen space coordinate is in the screen, I also have a max search distance that makes it fade when it reaches the end (artist controlled). I also check the normal vectors at the given pixel, and at the resulting pixel.

For 16 samples it runs at 5ms at 720p. For 8 samples that IMHO looks a lot like the 16 samples it runs at 3ms at 720p. (AMD A8-4500m)


8 samples is gruesome on 1680x1050 screen this is a block size of 1680/8 = 210 pixels. Steping over 210 pixel should miss even large scale objects (it's 12.5% of the screen size). What kind of scene do you use? Also why 32 samples for the bisection test? As soon as you get to a step size below 1 pixel there is no gain anymore to quality. For 210 pixels this is floor(log2(210))+1 = 8 samples. 32-8 = 24 samples wasted. Mind elaborate the reason for the 32 samples? Or are you using after the first hit a starting step size that is larger than 210/2 = 105 pixels?


I'm not doing this in screen space, because as you wrote it, things get rather bad at high resolutions. Instead I move around in view space, where I have the nearz at 1.0 and farz at 1000. My step size for the initial pass is 1/4th of the view space z length. I do the initial pass until I go 'beyond' the view space z at the given pixel. (So this pass may terminate before 8 steps are done) Then I do the binary search halving the step size at each step. (32 or 16 steps, fixed)

The scene is the sponza scene from crytek.

I have no idea why 32. It is an empirical value tongue.png

That is a good idea I could check for the step size, I'm only worried about the non-uniform flow control that's going on, and I'm not sure yet which would hurt more.

As for the 'when to fade out', ie. when we reach the boundaries of screen space, I do a lot of checks. I check if the resulting screen space coordinate is in the screen, I also have a max search distance that makes it fade when it reaches the end (artist controlled). I also check the normal vectors at the given pixel, and at the resulting pixel.

For 16 samples it runs at 5ms at 720p. For 8 samples that IMHO looks a lot like the 16 samples it runs at 3ms at 720p. (AMD A8-4500m)

Actually for a long ray running across the screen it does not matter that much if you are in screen space or view space. Both translate to the other using a simple calculation. So if you take the start and end point of the ray in view space and translate it into screen space and you split it up into 8 pieces you end up in average with a pixel block size of 210. The only difference is that stepping in view space instead of screen space the block size is not uniform (first step huge block size then smaller with each step). So from this point of view I still expect jumps of roughly 210 pixels per ray step which is more than 10% of the screen size for long rays. So how does it pan out with geometry thinner than roughly 200 pixels on screen?

What do you mean with "non-uniform flow control"? If you don't continue after fine-checking a pixel block where do you need non-uniform control flow? In this situation you need one initial loop to find the candidate pixel block for the fine-search and then a second loop afterwards doing the fine-search on the found candidate block (no matter if nested or not. on my card nesting drops speed like hell). So the performance is 8-loop + 32-loop hence 40-loop in total for all pixels in a warp.

Mind stating what card you run this on and how the artifacts look like?

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

This topic is closed to new replies.

Advertisement