Jump to content

  • Log In with Google      Sign In   
  • Create Account

WFP

Member Since 23 Mar 2013
Offline Last Active Yesterday, 06:37 PM

#5187536 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 16 October 2014 - 06:57 PM

Hi Lukas,

Glad to see there are still a few readers on this thread :).  I was actually able to get the code working to walk behind objects using the min/max version of the Hi-Z buffer.  There are still a few issues that can creep up when a ray simply doesn't have the data at its endpoint, but my solution solved for a great deal of missing data (screenshots below).

 

You're right, in the book the author mentions only using the min version, but then obviously uses the min/max variant in the implementation snippets provided.  So when you're building out your Hi-Z buffer, you simply want to to track both the minimum and maximum values, and return something like (it sounds like you're doing this correctly, but for the sake of future readers):

return float2(
	min(min(minDepth.r, minDepth.g), min(minDepth.b, minDepth.a)),
	max(max(maxDepth.r, maxDepth.g), max(maxDepth.b, maxDepth.a)));

And then you make a few small updates to the ray tracing step, namely:

// note that this now returns a float2 and samples both the red and green components
float2 getMinimumDepthPlane(float2 ray, float level)
{
	return hiZBuffer.Load(int3(ray.xy, level)).rg;
}

and in the main ray-tracing loop, you update it like so:

// above as before

float2 zminmax = getMinimumDepthPlane(oldCellIdx, level);
		
// intersect only if ray depth is between the minimum and maximum depth planes
float3 tmpRay = intersectDepthPlane(o, d, min(max(ray.z, zminmax.r), zminmax.g));

// below as before

In regards to the visibility buffer, I actually ended up going with what the book provided.  When I looked at the results of the visibility buffer in Visual Studio Graphics Debugger, RenderDoc, or NSight, I was seeing a lot of artifacts using the suggested updates.  Your mileage may vary, but the provided implementation is what I recommend starting with.

 

Below are two screenshots, the first is before the code to walk behind objects was added (notice the large empty space between the soldiers hand and foot where rays were incorrectly being blocked) and the second is after the code was added.  Like I said, you may still see a few ray misses with the code present, but they are generally due to actual data missing due to occlusion, rather than incorrect hits.

 

Before:  https://www.dropbox.com/s/9nj523ieryvfb7a/screenshot_31.png?dl=0

After:  https://www.dropbox.com/s/l245n2lpwwreihw/screenshot_32.png?dl=0

 

Thanks,

WFP




#5182167 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 22 September 2014 - 11:32 AM

As mentioned in my previous post, here is the way I've settled on handling weighting and blending the cone tracing results.  For anyone following along, you'll notice that I decided not to bother with how much the cone sphere intersected the hi-z buffer.  All the various combinations I tried with that approach gave bad results with a lot of artifacts.  If someone else gets  the weighting working properly and wants to share, I'd like to revisit it, but for the time being I'm content with what I have below.  You'll notice I've brought in another new parameter to the function.  The gloss is simply what you'd store in a gloss map (for me it's actually [1.0f - roughness], but you get the point).  I simply use this as a further weighting on the output color.  I noticed previously that even when I had a roughness of 1 (very rough), I was still getting more reflection than I would have expected, even at non-grazing angles.  This new parameter helps clean that up nicely and fades more smoothly into your backup environment maps the rougher the surface becomes.  I have it commented in the code, but it's worth noting that I leave gloss out of the alpha calculation.  The more terms you add to that value, the further down you drive it for a particular iteration, and I've found that I get better results by just using the visibility and attenuation values.  If you're implementing this effect, I encourage you to sub values in and out and see which works best for your needs.

 

 

One of the issues I would like to address in the future is the color buffer convolution stage.  While the fading and blending helps this to an extent, you can easily get colors bleeding to areas where they shouldn't be due to the plain Gaussian blur used in creating this buffer's resources.  In my setup, I also need to more properly blend this effect with the local cubemaps (most likely what I will tackle next).

 

As would be expected, this effect tends to work better the more depth information you have.  For example, I've been posting images of the effect running in an outdoor scene with a few blocks and other programmer art, but largely an empty space.  There are a lot of ray misses and some large depth discrepancies, and that results in some hard edge artifacts even with rougher surfaces.  In the sponza scene with more "going on", I've found this technique to perform better aesthetically.  I will post images of the same scene I've been using below, and later on after some more testing and polish I will try to get some out showing it running in sponza.

 

A great suggestion that Bruzer100 gave me was to not let the effect run to far.  In other words even in the ray tracing steps you want to stop and return a non-hit after the ray has traveled a certain distance.  I've made this into a constant buffer variable so I can update it per scene, and is the approach I would suggest be taken.  This helps clean up a good amount of artifacts, and lets your fallback environment maps take over completely at known distances.

float4 coneSampleWeightedColor(float2 samplePos, float mipChannel, float3 rayStartVS, float radiusSS, float gloss)
{
	// sample center and take additional sample points around the cone perimeter and inside to better blend the color result
	float3 sampleColor = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).rgb;
	float3 sr = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos + float2(radiusSS, 0.0f), mipChannel).rgb;
	float3 sl = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos - float2(radiusSS, 0.0f), mipChannel).rgb;
	float3 sb = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos + float2(0.0f, radiusSS), mipChannel).rgb;
	float3 st = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos - float2(0.0f, radiusSS), mipChannel).rgb;
	float halfRadiusSS = radiusSS * 0.5f;
	float3 srh = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos + float2(halfRadiusSS, halfRadiusSS), mipChannel).rgb;
	float3 slh = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos - float2(halfRadiusSS, halfRadiusSS), mipChannel).rgb;
	float3 sbh = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos + float2(halfRadiusSS, halfRadiusSS), mipChannel).rgb;
	float3 sth = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos - float2(halfRadiusSS, halfRadiusSS), mipChannel).rgb;

	float3 blendedColor = (sr + sl + sb + st + srh + slh + sbh + sth + sampleColor) / 9.0f;

	float visibility = visibilityBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).r;

	float depth = hiZBuffer.SampleLevel(sampPointClamp, samplePos, 0.0f).r;
	float3 rayEndVS = viewSpacePositionFromDepth(samplePos, depth);
	float distanceTraveled = length(rayEndVS - rayStartVS);
	// distance squared caused the effect to fade too fast for larger objects in the distance
	float attenuation = distanceTraveled == 0.0f ? 1.0f : saturate(1.0f / distanceTraveled);

	// try mix and matching what goes into the alpha component - don't want to drive it down too much
	return float4(blendedColor * visibility * attenuation * gloss, visibility * attenuation); // gloss intentionally left out of alpha
}

New screenshots:

https://www.dropbox.com/s/pze4pi5k9kb8d5f/screenshot_20.png?dl=0

https://www.dropbox.com/s/dfxravl5ghbkhia/screenshot_21.png?dl=0

https://www.dropbox.com/s/bbkiuzsycj6d8l3/screenshot_22.png?dl=0

https://www.dropbox.com/s/kciy1pxvfap1pgn/screenshot_23.png?dl=0

https://www.dropbox.com/s/2xg9amomoj25kyn/screenshot_24.png?dl=0

https://www.dropbox.com/s/2n1tisaqlt028k5/screenshot_25.png?dl=0

https://www.dropbox.com/s/ty4xdomw2ppkzbl/screenshot_26.png?dl=0




#5182107 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 22 September 2014 - 08:52 AM

Hey 3DLuver and Frenetic Pony,

 

I've actually read through that paper a few times and while it's a good read, very detailed, and does create improved results, I ultimately landed on not pursuing it for the time being, mostly due to the reasons Frenetic Pony mentioned.

 

There are so many new and promising-looking GI techniques emerging that I'm hesitant to spend too much more time on screen-space techniques like this that will be better served by proper GI techniques.  Cyril Crassin's Voxel Cone Tracing (a great paper in its own right, and where the book's technique gets a lot of its inspiration from) is gaining a good amount of popularity, and if you saw the NVIDIA 900 series release announcement (http://www.geforce.com/whats-new/articles/maxwell-architecture-gtx-980-970) you'll notice that they're touting this method becoming pertinent to running in real-time.  If I've been following correctly, it seems like UE4 had it in their engine for a while (Elemental demo), but ultimately decided to pull it out for the release version and go with their Enlighten approach instead.  I'm guessing this was due to the cost of building and maintaining the octree on every frame - a cost that rises with more dynamic elements per scene.  With the higher horsepower of the next iteration of GPUs combined with current and upcoming lower overhead graphics APIs, it seems like those costs are finally becoming manageable.  I've actually come across a YouTube video showing this technique working in Unity, also, so it's definitely gaining a lot of ground.  The Tomorrow Children's 3D texture adaptation of GI (http://fumufumu.q-games.com/archives/2014_09.php#000934) is also looking really good, and as with anything else, it's great to have a competitor to help drive further research and development.

 

Frenetic Pony has a good point in that this SSLR technique, and really any SSLR technique, is not meant to stand entirely on its own.  There needs to be a fallback for missed ray hits and edge cases.  For my engine, I go with the one global cube map + several local cubemaps + SSLR approach and blend them together for my final result.  After some code cleanup today, I'm going to post again later to show the sample weighting I'm currently going with that seems to give nice results and blends nicely depending on surface roughness and what data can actually be obtained from the ray tracing buffer.  I'll discuss some of the current issues and ideas to address them, also.

 

It's great to have more people interested in helping work out the nuances of this technique.  Please feel encouraged to share any of your thoughts, questions, or suggestions in helping to improve what we've already worked out!

 

Thanks,

WFP




#5181160 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 17 September 2014 - 05:44 PM

A quick update for anyone following along.  I've found at least one way in addition to the ones mentioned above to better the overall results of the cone tracing pass.  It's kind of an obvious one (and one I was hoping to avoid), but the results are undeniably better in its current states.  By taking several samples around the circle's perimeter and inside the circle area, I was able to get rid of a lot of the ugly jaggedness that can show up from only taking the one sample at the calculated position.  You can see an example of this in my last post when looking at the color transitions between blocks in the reflection.  The implementation is straight forward enough - just offset by the circle radius in x and y directions and take samples, then halve the radius and take samples of the square created by offsetting again from the radius (see code below).  I've tried a few variations, including moving the last four samples to the circle radius and giving the center sample more weight than the offsets, but in addition to saving myself from a sine and cosine operation per iteration, for me at least weighing the samples evenly seems to give the best results.

 

I'm going to stop putting it off now and actually hunker down on weighting with the course depth volume and the sphere, but I wanted to at least provide this small update so everyone can see that it's starting to look halfway decent :) .

 

Thanks,

WFP

 

Screenshot with higher sampling inside cone:  https://www.dropbox.com/s/o0y6z5a1doa16vl/screenshot_19.png?dl=0

 

New weighting method in cone tracing pixel shader:

float4 coneSampleWeightedColor(float2 samplePos, float mipChannel, float3 rayStartVS, float radiusSS)
{
	// sample center and take additional sample points around the cone perimeter and inside to better blend the color result
	float3 sampleColor = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).rgb;
	float3 sr = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos + float2(radiusSS, 0.0f), mipChannel).rgb;
	float3 sl = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos - float2(radiusSS, 0.0f), mipChannel).rgb;
	float3 sb = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos + float2(0.0f, radiusSS), mipChannel).rgb;
	float3 st = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos - float2(0.0f, radiusSS), mipChannel).rgb;
	float halfRadiusSS = radiusSS * 0.5f;
	float3 srh = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos + float2(halfRadiusSS, halfRadiusSS), mipChannel).rgb;
	float3 slh = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos - float2(halfRadiusSS, halfRadiusSS), mipChannel).rgb;
	float3 sbh = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos + float2(halfRadiusSS, halfRadiusSS), mipChannel).rgb;
	float3 sth = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos - float2(halfRadiusSS, halfRadiusSS), mipChannel).rgb;

	float3 blendedColor = (sr + sl + sb + st + srh + slh + sbh + sth + sampleColor) / 9.0f;

	float visibility = visibilityBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).r;
	float4 hizMinVis = hiZBuffer.GatherRed(sampTrilinearClamp, samplePos, ceil(mipChannel));
	float4 hizMaxVis = hiZBuffer.GatherGreen(sampTrilinearClamp, samplePos, ceil(mipChannel));
	
	float minz = linearizeDepth(min(min(hizMinVis.r, hizMinVis.g), min(hizMinVis.b, hizMinVis.a)));
	float maxz = linearizeDepth(max(max(hizMaxVis.r, hizMaxVis.g), max(hizMaxVis.b, hizMaxVis.a)));
	float depth = hiZBuffer.SampleLevel(sampPointClamp, samplePos, 0.0f).r;

	float3 rayEndVS = viewSpacePositionFromDepth(samplePos, depth);
	float distanceTraveled = length(rayEndVS - rayStartVS);
	// distance squared caused the effect to fade much to fast for larger objects in the distance
	float attenuation = distanceTraveled == 0.0f ? 1.0f : saturate(1.0f / (distanceTraveled));

	float weight = 1.0f;
	depth = linearizeDepth(depth);

	///////// the below is just to force the debugger to keep the variables around for inspection
	if(depth > 5000.0f && (maxz + minz) > 500002.0f)
	{
		return 100000.0f;
	}
	///////// delete the above once weighting is figured out

	return float4(blendedColor * visibility * attenuation * weight, visibility * attenuation * weight);
}



#5180300 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 14 September 2014 - 01:40 PM

OK, so a few updates from working on the cone tracing stuff a bit this weekend.

 

First, the pre-integration stuff didn't pan out as I had hoped.  The changes proposed in the thread did give better results in certain situations, but it also introduced a lot of artifacts into my visibility buffer.  I've since reverted back to the previous way that's closer to what's in the book and have sought out new means of weighting the sample contributions for the cone tracing pass.

 

I've tried several combinations of using the hi-z buffer, but at this point haven't gotten very far, but I have noticed some improved (aesthetically, anyway) results by using a distance attenuation function when weighting the sample.  See below:

float4 coneSampleWeightedColor(float2 samplePos, float mipChannel, float3 rayStartVS)
{
	// placeholder - this is just to get something on screen
	float3 sampleColor = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).rgb;
	float visibility = visibilityBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).r;
	float4 hizMinVis = hiZBuffer.GatherRed(sampTrilinearClamp, samplePos, mipChannel);
	float4 hizMaxVis = hiZBuffer.GatherGreen(sampTrilinearClamp, samplePos, mipChannel);
	
	float minz = min(min(hizMinVis.r, hizMinVis.g), min(hizMinVis.b, hizMinVis.a));
	float maxz = max(max(hizMaxVis.r, hizMaxVis.g), max(hizMaxVis.b, hizMaxVis.a));
	float depth = hiZBuffer.SampleLevel(sampPointClamp, samplePos, 0.0f).r;

	float3 rayEndVS = viewSpacePositionFromDepth(samplePos, depth);
	float distanceTraveled = length(rayEndVS - rayStartVS);
	// distance squared caused the effect to fade much to fast for larger objects in the distance
	float attenuation = distanceTraveled == 0.0f ? 1.0f : saturate(1.0f / (distanceTraveled));

	return float4(sampleColor * visibility * attenuation, visibility * attenuation);
}

You'll notice a couple of new variables that are obtained but not used.  Those are just things that constantly seem to come up while trying to figure out a weighting scheme for the hi-z buffer in all of this, so I'm just leaving them in for now to avoid re-typing them for the thousandth times.  The distance attenuation is nothing too special.  I attenuate the distance from the ray starting position in view space to the current sampling position in view space.  I actually tried this with a distance-squared attenuation function, and it blended close objects nicely, but fades way too fast for distant objects, so I'm sticking with linear.  This set up helps give results similar to what I had with the altered pre-integration pass, but with my visibility buffer still in tact.

 

If anyone has suggestions for using the hi-z buffer to add to the weighting scheme, they would be greatly appreciated.  The effect is starting to come along more nicely now, but I think that is still the big missing piece.

 

Thanks,

WFP

Screenshot with old visibility set + new attenuation weighting:  https://www.dropbox.com/s/uyc7g5p4ellshuu/screenshot_18.png?dl=0

 

EDIT: grammar




#5179160 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 09 September 2014 - 02:51 PM

OK, some more good news.  You guys are definitely onto something regarding the missing minZ term in finding the integration value for the visibilty buffer.  When I update the equation to use that term, clamp it to the [0.0f, 1.0f] range, and invert it, I am getting results much more inline with what I would expect.  As, always, there are some artifacts present, but one thing at a time.

 

Here's my updated code:

float4 integration = (1.0f - saturate((fineZ.xyzw - minZ) * abs(coarseVolume))) * visibility.xyzw;

New screenshots (low roughness to high roughness):

https://www.dropbox.com/s/qi6ykxz3zdrushc/screenshot_13.png?dl=0

https://www.dropbox.com/s/11bhkr0is44ualu/screenshot_14.png?dl=0

https://www.dropbox.com/s/1zx7stbw3rc9jz9/screenshot_15.png?dl=0

https://www.dropbox.com/s/lngx0sz28q8fp7k/screenshot_16.png?dl=0

https://www.dropbox.com/s/2h8fcc9vlf8m4zm/screenshot_17.png?dl=0




#5179134 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 09 September 2014 - 01:18 PM

Hi again,

 

I've made a little more progress on cleaning up the cone-tracing and getting rid of some of the artifacts I was seeing earlier.

 

If you look at the screenshots I posted in Post #31, you'll see blocky artifacts in the reflection.  In particular, the ones I'm addressing here are towards the middle.  The ones on the side are caused by the nearby rays on the outside not hitting anything as they travel and thus being left out of the cone tracing pass, meaning they provide nothing to blend with.  I'll see what I can do with those later.

 

For the artifacts towards the middle, however, you'll notice that they look brighter than what you'd expect, especially considering they're further out from the base of the reflection (here, the contact point of the stack of blocks).  The issue is that the implementation provided does not take into account the necessity to do a little manual blending on the result returned from the coneSampleWeightedColor method.  It's presented as such:

(EDIT: should be +=)

totalColor += coneSampleWeightedColor(samplePos, mipChannel);

if(totalColor.a >= 1.0f)
{
	break;
}

If you step through this loop in a graphics debugger, you'll quickly see that if there are more than a few samples, you'll get an alpha value exceeding 1.  What you actually need to do is compensate for this, and for the final value that returns an alpha greater than 1, reduce it's color components accordingly.  This will blend it in nicely with the color that's already accumulated and alleviate the artifacts that show up when multiple samples are taken in the cone tracing loop.

 

Here's what my loop looks like now:

	float remainingAlpha = 1.0f;

	// cone-tracing using an isosceles triangle to approximate a cone in screen space
	for(int i = 0; i < 7; ++i)
	{
		/// ... same as what's been posted earlier


		float4 newColor = coneSampleWeightedColor(samplePos, mipChannel);

		remainingAlpha -= newColor.a;
		if(remainingAlpha < 0.0f)
		{
			newColor.rgb *= (1.0f - abs(remainingAlpha));
		}
		totalColor += newColor;
		
		if(totalColor.a >= 1.0f)
		{
			break;
		}

		// ... same as what's been posted earlier
	}

This clears up the artifacts nicely, as shown in the screenshot below.  Alongside the edge artifacts, I also want to see what can be done about the blockiness in the colors sampled.  For example, look at the reflections of the white box beside the black box.  You'll notice that the reflection does indeed blur, but still has some blockiness to it.  I'm hoping that it may be as simple as taking a smaller step when finding the next adjacent length, but I need to test that to verify.

 

EDIT:  Moving the adjacent length distance may have helped, but if so it was only marginal.  I think our best bet to fix those blocks may be to get the visibility buffer and weighting sorted out. smile.png

 

New screenshot:  https://www.dropbox.com/s/vdoq524yjq1bb6q/screenshot_12.png?dl=0

 

Thanks,

WFP




#5178989 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 08 September 2014 - 08:18 PM

Hey guys,

 

Glad you're both still working hard on the visibility buffer stuff - it sounds like you're making great progress!  I got to thinking today, and while re-skimming through the section on the cone tracing pass, it dawned on me that we need the INVERSE cosine (arccos, acos, whatever you prefer to call it) when converting the specular power to a cone angle using the inverse cummulative distribution function for the Phong distribution model.  The book (at least the VitalSource version I am using) leaves out an all-important -1 superscript after the cosine in equation 4.1 on page 168.  I was scratching my head on this because previously no matter what I set the surface roughness to, I was still getting roughly the same values.  With this equation appropriately updated, the reflection blurs as expected with variations in surface roughness. :)

 

I've verified the above with equation 7 found here:  http://http.developer.nvidia.com/GPUGems3/gpugems3_ch20.html

 

Here's the modified code from what I initially posted.  I also added a check to see if the specular power was at the top of the supported range in my engine, then to clamp the cone angle to 0 - a perfectly mirrored surface.

float specularPowerToConeAngle(float specularPower)
{
	if(specularPower >= exp2(RB_MAX_SPECULAR_EXP))
	{
		return 0.0f;
	}
	const float xi = 0.244f;
	float exponent = 1.0f / (specularPower + 1.0f);
	return acos(pow(xi, exponent));
}

And some new screens, one with low (but some) surface roughness, and another with high surface rougness.  There are clearly a lot of artifacts to clean up, but I'm hoping the visibility buffer and proper weighting will alleviate most of those once we get that all figured out.

 

Low roughness:  https://www.dropbox.com/s/nlykj7694uuxsxf/screenshot_10.png?dl=0

 

High roughness:  https://www.dropbox.com/s/9f4hprblk69s4pr/screenshot_11.png?dl=0

 

Thanks,

 

WFP




#5178776 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 07 September 2014 - 06:58 PM

Hi TiagoCosta,

 

There definitely seems to be a few things off in the implementation provided in the book.  I tried yours out and didn't see much difference, but that could be because we're still wrestling with the actual cone-tracing part itself.

 

One thing I noticed this afternoon that I had messed up on in the original code I posted to this thread was that I had mixed up some parameters in the cone tracing step.  For the method isoscelesTriangleInRadius - I was sending the parameters in backwards.  The code should read:

float isoscelesTriangleInRadius(float a, float h)
{
	float a2 = a * a;
	float fh2 = 4.0f * h * h;
	return (a * (sqrt(a2 + fh2) - a)) / (4.0f * h);
}

and should be called from the loop with:

// calculate in-radius of the isosceles triangle
float incircleSize = isoscelesTriangleInRadius(oppositeLength, adjacentLength);

The difference is that I had originally posted adjacentLength as the first parameter, followed by oppositeLength - obviously incorrect, and easily verified here: http://mathworld.wolfram.com/IsoscelesTriangle.html

 

I'm also wondering if isoscelesTriangleOpposite should be using half the cone angle, since we're basically splitting the cone into two halves to make it a right triangle for finding its sides.

float isoscelesTriangleOpposite(float adjacentLength, float coneTheta)
{
	// should this be like below with the additional *0.5f term?
	return 2.0f * tan(coneTheta * 0.5f) * adjacentLength;
}

I'm not positive on that yet, though, so don't want to steer anyone in the wrong direction if it's incorrect.

 

EDIT:  A little verifying with scratch paper says yes, the cone angle should be halved as shown above.  If anyone sees an error in that, please let me know.

 

Regarding contacting the author, I think several people have already.  He's responded to a few posts on his twitter account making it sound like he was unable to release the code for whatever reason:  https://twitter.com/YasinUludag/with_replies.  The disappointing thing is that the article is obviously incomplete and directs the user to the source for a better understanding at multiple points.  Not to mention that Section 4.12 'Acknowledgments' has enough people mentioned, including leadership roles, that surely someone should have had the whereabouts to speak up and stop the article from going out if it were going to be released in an incomplete state.  Oh well, we're making some good progress on it, and I'm hoping we can all find a solution together smile.png.

 

Thanks!

 

WFP




#5178310 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 05 September 2014 - 08:19 AM

So I've finally figured out what was causing the stair artifacts we were seeing when running in anything but power of two texture sizes.  In the article, the author uses offsets of -1 in obtaining his other three points for comparison, but it turns out that, at least for my NVIDIA card (760 GTX), the opposite needed to be true.  Using offsets in the positive direction (see below) alleviated the stair artifacts that were showing up.  There seems to be an implementation difference in how ATI and NVIDIA cards handle this, because the code worked with -1 offsets on the ATI card that it was tested on.  I still need to follow up to make sure changing the sign to positive doesn't break the technique on those cards, but at the very least, at least we have an answer for what was causing it. :)  I've posted the modified HiZ buffer construction pixel shader that I use below, as well as a screenshot running at 1536x864 with no stair artifacts showing up.  Next steps are filtering this buffer to fill in the tiny artifacts/gaps that show up (as well as temporal stability, etc., eventually), and then applying the cone-tracing step, which Jp is doing some great work on. :)

 

-WFP

Screenshot: https://www.dropbox.com/s/l70nv650e75z3bw/screenshot_9.png?dl=0

 

HiZ_PS.hlsl:

struct VertexOut
{
	float4 posH : SV_POSITION;
	float2 tex : TEXCOORD;
};

SamplerState sampPointClamp : register(s0);

Texture2D hiZBuffer : register(t0);

float2 main(VertexOut pIn) : SV_TARGET
{
	float2 texcoords = pIn.tex;
	float4 minDepth = 0.0f;
	float4 maxDepth = 0.0f;

	// sample level zero since only one mip level is available with the bound SRV
	float2 tx = hiZBuffer.SampleLevel(sampPointClamp, texcoords, 0.0f, int2(0, 0)).rg;
	minDepth.r = tx.r;
	maxDepth.r = tx.g;

	float2 ty = hiZBuffer.SampleLevel(sampPointClamp, texcoords, 0.0f, int2(0, 1)).rg;
	minDepth.g = ty.r;
	maxDepth.g = ty.g;

	float2 tz = hiZBuffer.SampleLevel(sampPointClamp, texcoords, 0.0f, int2(1, 0)).rg;
	minDepth.b = tz.r;
	maxDepth.b = tz.g;

	float2 tw = hiZBuffer.SampleLevel(sampPointClamp, texcoords, 0.0f, int2(1, 1)).rg;
	minDepth.a = tw.r;
	maxDepth.a = tw.g;

	return float2(
		min(min(minDepth.r, minDepth.g), min(minDepth.b, minDepth.a)),
		max(max(maxDepth.r, maxDepth.g), max(maxDepth.b, maxDepth.a)));
}



#5174150 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 16 August 2014 - 03:39 PM

Yep, and that's exactly why it sits so uneasy with me. Just haven't been able to get rid of those stepping artifacts otherwise yet. I'll keep at it and see if anything else will get rid of them.


#5174130 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 16 August 2014 - 01:10 PM

OK so I've found one way to address the power of two issue, but I'm still very open to suggestions if anyone has any.  I've updated my getMinimumDepthPlane method (below) to always sample level 0.0f (largest mip level) at the texture coordinates provided.  This does seem to fix my issues with the stair-like artifacts, but it doesn't sit well with me because if this were the correct solution, why would the author have passed in level and rootLevel in the first place?  Anyway, the ray tracing steps currently work fairly well now (still need to address other artifacts using min/max values) at any resolution and stepping through in the VS2013 graphics debugger shows that it converges on a solution (for most pixels tested) in about 13 iterations (far less than my 64 limit).

float getMinimumDepthPlane(float2 ray, float level, float rootLevel)
{
	// not sure why we need rootLevel for this - for textures that are non-power-of-two, 0.0f works better
	return hiZBuffer.SampleLevel(sampPointClamp, ray.xy, 0.0f).r;
}

Screenshot at 1536x864:

https://www.dropbox.com/s/eo2wkiz87bswgz5/screenshot_7.png




#5173483 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 13 August 2014 - 08:41 PM

Actually, I got a little extra time this evening to work on it and wanted to update with a screenshot of running at 1024x512.  The power of two texture is clearly helping the stepping artifacts (I tried on several other power of 2 combinations, as well) so now I need to see what I can do to adapt that to non-power of two resolutions.  Any ideas?

 

The interlaced lines I'm confident can be repaired by updating the epsilon.  I haven't tested it much in my code yet, but I'm guessing just moving it closer to something like below will help a lot.

static const float2 HIZ_CROSS_EPSILON = float2(texelWidth, texelHeight) * exp2(HIZ_START_LEVEL);

There are a few other artifacts that appear under things like spheres and character arms that I think I can solve by using the min and max depth in combination with one another to walk behind objects - these are examples of the nonsensical intersections Bruzer mentioned.

 

Anyway, I'm calling it a night right now, but will be working more on it as soon as I get a chance.

 

-WFP

 

Screenshot at 1024x512:

https://www.dropbox.com/s/3uvq0mrczps6vc0/screenshot_5.png




#5173472 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 13 August 2014 - 07:35 PM

Hey Jp,

 

I'm rendering at 1536x864 which I know is a bit of an unconventional resolution, but I use it to help ensure that my scenes and effects can be rendered without imposing limitations on windowed client dimensions.  The artifacts I mentioned above do still show up if I use more traditional dimensions like 1280x720 or 1920x1080.  I'm glad you mentioned using a power of 2 texture, though, because when I tested tonight by setting the output to 1024x512 and 1024x1024, the stair-like artifacts did seem to be alleviated, though some artifacts remain.  I wonder if I need to revisit the way I'm building my hi-z buffer due to not using power of two textures.  I will look into that tomorrow when I get some time and see if that helps out.

 

Thanks,

 

WFP




#5173175 Help with GPU Pro 5 Hi-Z Screen Space Reflections

Posted by WFP on 12 August 2014 - 03:33 PM

Hi Jp,

 

Bruzer and I have spoken a few times since this topic started originally and he has been great in helping me almost figure this thing out.  For his sanity and for the good of the larger audience, it's probably best for us to bring the conversation back to this thread, though, so I'll post below what I've worked out so far with his help.  Also, the chapter in GPU Pro 1 by Michal Drobot on Quadtree Displacement Mapping is a big help in understanding this, and is what the author of this article based his ray-tracing steps on.  I still have some very major issues in my implementation (screenshots below), so I'm hoping that anyone reading over this may be able to help out and call me out on things I've done in a bone-headed way.

 

You'll notice in my implementation that some of the method arguments are a little different from what's in the book.  For example, I pass the full float3 vectors to intersectDepthPlane and some other methods.

 

Also, I've done some preliminary testing on doing a small (8 or so iterations) linear ray march before doing the hi-z traversal in order to reduce artifacts of immediate intersections and found that it did help, but due to the current state of my shader, I pulled those back out until the basic stuff was working.

 

I hope this helps, and again, please call out any blatant errors you see in my current implementation attempt, as they clearly exist.

 

 

This is the pixel shader in its current state.  Notice that currently I'm still trying to get the ray-tracing through the hi-z buffer part working, so I'm overwriting the cone-tracing output to be the equivalent to a cone angle of 0 (i.e., a perfectly smooth/mirror surface).

#include "HiZSSRConstantBuffer.hlsli"
#include "../../LightingModel/PBL/LightUtils.hlsli"
#include "../../ConstantBuffers/PerFrame.hlsli"
#include "../../ShaderConstants.hlsli"

struct VertexOut
{
	float4 posH : SV_POSITION;
	float3 viewRay : VIEWRAY;
	float2 tex : TEXCOORD;
};

SamplerState sampPointClamp : register(s0); // point sampling, clamped borders
SamplerState sampTrilinearClamp : register(s1); // trilinear sampling, clamped borders

Texture2D hiZBuffer : register(t0); // hi-z buffer - all mip levels
Texture2D visibilityBuffer : register(t1); // visibility buffer - all mip levels
Texture2D colorBuffer : register(t2); // convolved color buffer - all mip levels
Texture2D normalBuffer : register(t3); // normal buffer - from g-buffer
Texture2D specularBuffer : register(t4); // specular buffer - from g-buffer (rgb = ior, a = roughness)

static const float HIZ_START_LEVEL = 2.0f;
static const float HIZ_STOP_LEVEL = 2.0f;
static const float HIZ_MAX_LEVEL = float(cb_mipCount);
static const float2 HIZ_CROSS_EPSILON = float2(texelWidth, texelHeight); // maybe need to be smaller or larger? this is mip level 0 texel size
static const uint MAX_ITERATIONS = 64u;

float linearizeDepth(float depth)
{
	return projectionB / (depth - projectionA);
}

///////////////////////////////////////////////////////////////////////////////////////
// Hi-Z ray tracing methods
///////////////////////////////////////////////////////////////////////////////////////

static const float2 hiZSize = cb_screenSize; // not sure if correct - this is mip level 0 size

float3 intersectDepthPlane(float3 o, float3 d, float t)
{
	return o + d * t;
}

float2 getCell(float2 ray, float2 cellCount)
{
	// does this need to be floor, or does it need fractional part - i think cells are meant to be whole pixel values (integer values) but not sure
	return floor(ray * cellCount);
}

float3 intersectCellBoundary(float3 o, float3 d, float2 cellIndex, float2 cellCount, float2 crossStep, float2 crossOffset)
{
	float2 index = cellIndex + crossStep;
	index /= cellCount;
	index += crossOffset;
	float2 delta = index - o.xy;
	delta /= d.xy;
	float t = min(delta.x, delta.y);
	return intersectDepthPlane(o, d, t);
}

float getMinimumDepthPlane(float2 ray, float level, float rootLevel)
{
	// not sure why we need rootLevel for this
	return hiZBuffer.SampleLevel(sampPointClamp, ray.xy, level).r;
}

float2 getCellCount(float level, float rootLevel)
{
	// not sure why we need rootLevel for this
	float2 div = level == 0.0f ? 1.0f : exp2(level);
	return cb_screenSize / div;
}

bool crossedCellBoundary(float2 cellIdxOne, float2 cellIdxTwo)
{
	return cellIdxOne.x != cellIdxTwo.x || cellIdxOne.y != cellIdxTwo.y;
}

float3 hiZTrace(float3 p, float3 v)
{
	const float rootLevel = float(cb_mipCount) - 1.0f; // convert to 0-based indexing
	
	float level = HIZ_START_LEVEL;

	uint iterations = 0u;

	// get the cell cross direction and a small offset to enter the next cell when doing cell crossing
	float2 crossStep = float2(v.x >= 0.0f ? 1.0f : -1.0f, v.y >= 0.0f ? 1.0f : -1.0f);
	float2 crossOffset = float2(crossStep.xy * HIZ_CROSS_EPSILON.xy);
	crossStep.xy = saturate(crossStep.xy);

	// set current ray to original screen coordinate and depth
	float3 ray = p.xyz;

	// scale vector such that z is 1.0f (maximum depth)
	float3 d = v.xyz / v.z;

	// set starting point to the point where z equals 0.0f (minimum depth)
	float3 o = intersectDepthPlane(p, d, -p.z);

	// cross to next cell to avoid immediate self-intersection
	float2 rayCell = getCell(ray.xy, hiZSize.xy);
	ray = intersectCellBoundary(o, d, rayCell.xy, hiZSize.xy, crossStep.xy, crossOffset.xy);

	while(level >= HIZ_STOP_LEVEL && iterations < MAX_ITERATIONS)
	{
		// get the minimum depth plane in which the current ray resides
		float minZ = getMinimumDepthPlane(ray.xy, level, rootLevel);
		
		// get the cell number of the current ray
		const float2 cellCount = getCellCount(level, rootLevel);
		const float2 oldCellIdx = getCell(ray.xy, cellCount);

		// intersect only if ray depth is below the minimum depth plane
		float3 tmpRay = intersectDepthPlane(o, d, max(ray.z, minZ));

		// get the new cell number as well
		const float2 newCellIdx = getCell(tmpRay.xy, cellCount);

		// if the new cell number is different from the old cell number, a cell was crossed
		if(crossedCellBoundary(oldCellIdx, newCellIdx))
		{
			// intersect the boundary of that cell instead, and go up a level for taking a larger step next iteration
			tmpRay = intersectCellBoundary(o, d, oldCellIdx, cellCount.xy, crossStep.xy, crossOffset.xy); //// NOTE added .xy to o and d arguments
			level = min(HIZ_MAX_LEVEL, level + 2.0f);
		}

		ray.xyz = tmpRay.xyz;

		// go down a level in the hi-z buffer
		--level;

		++iterations;
	}

	return ray;
}

///////////////////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////////////////////////
// Hi-Z cone tracing methods
///////////////////////////////////////////////////////////////////////////////////////

float specularPowerToConeAngle(float specularPower)
{
	// based on phong reflection model
	const float xi = 0.244f;
	float exponent = 1.0f / (specularPower + 1.0f);
	/*
	 * may need to try clamping very high exponents to 0.0f, test out on mirror surfaces first to gauge
	 * return specularPower >= 8192 ? 0.0f : cos(pow(xi, exponent));
	 */
	return cos(pow(xi, exponent));
}

float isoscelesTriangleOpposite(float adjacentLength, float coneTheta)
{
	// simple trig and algebra - soh, cah, toa - tan(theta) = opp/adj, opp = tan(theta) * adj, then multiply * 2.0f for isosceles triangle base
	return 2.0f * tan(coneTheta) * adjacentLength;
}

float isoscelesTriangleInRadius(float a, float h)
{
	float a2 = a * a;
	float fh2 = 4.0f * h * h;
	return (a * (sqrt(a2 + fh2) - a)) / (4.0f * max(h, 0.00001f));
}

float4 coneSampleWeightedColor(float2 samplePos, float mipChannel)
{
	// placeholder - this is just to get something on screen
	float3 sampleColor = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).rgb;
	float visibility = visibilityBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).r;

	return float4(sampleColor * visibility, visibility);
}

float isoscelesTriangleNextAdjacent(float adjacentLength, float incircleRadius)
{
	// subtract the diameter of the incircle to get the adjacent side of the next level on the cone
	return adjacentLength - (incircleRadius * 2.0f);
}

///////////////////////////////////////////////////////////////////////////////////////

float4 main(VertexOut pIn) : SV_TARGET
{
	/*
	 * Ray(t) = O + D> * t
	 * D> = V>SS / V>SSz
	 * O = PSS + D> * -PSSz
	 * V>SS = P'SS - PSS
	 * PSS = {texcoord.x, texcoord.y, depth} // screen/texture coordinate and depth
	 * PCS = (PVS + reflect(V>VS, N>VS)) * MPROJ
	 * P'SS = (PCS / PCSw) * [0.5f, -0.5f] + [0.5f, 0.5f]
	 */
	int3 loadIndices = int3(pIn.posH.xy, 0);
	float depth = hiZBuffer.Load(loadIndices).r;
	// PSS
	float3 positionSS = float3(pIn.tex, depth);
	float linearDepth = linearizeDepth(depth);
	// PVS
	float3 positionVS = pIn.viewRay * linearDepth;

	// V>VS - since calculations are in view-space, we can just normalize the position to point at it
	float3 toPositionVS = normalize(positionVS);
	// N>VS
	float3 normalVS = normalBuffer.Load(loadIndices).rgb;
	if(dot(normalVS, float3(1.0f, 1.0f, 1.0f)) == 0.0f)
	{
		return float4(0.0f, 0.0f, 0.0f, 0.0f);
	}
	
	float3 reflectVS = reflect(toPositionVS, normalVS);
	float4 positionPrimeSS4 = mul(float4(positionVS + reflectVS, 1.0f), projectionMatrix);
	float3 positionPrimeSS = (positionPrimeSS4.xyz / positionPrimeSS4.w);
	positionPrimeSS.x = positionPrimeSS.x * 0.5f + 0.5f;
	positionPrimeSS.y = positionPrimeSS.y * -0.5f + 0.5f;

	// V>SS - screen space reflection vector
	float3 reflectSS = positionPrimeSS - positionSS;

	// calculate the ray
	float3 raySS = hiZTrace(positionSS, reflectSS);

	// perform cone-tracing steps

	// get specular power from roughness
	float4 specularAll = specularBuffer.Load(loadIndices);
	float specularPower = roughnessToSpecularPower(specularAll.a);

	// convert to cone angle (maximum extent of the specular lobe aperture
	float coneTheta = specularPowerToConeAngle(specularPower);

	// P1 = positionSS, P2 = raySS, adjacent length = ||P2 - P1||
	
	// need to check if this is correct calculation or not
	float2 deltaP = raySS.xy - positionSS.xy;
	float adjacentLength = length(deltaP);
	
	// need to check if this is correct calculation or not
	float2 adjacentUnit = normalize(deltaP);

	float4 totalColor = float4(0.0f, 0.0f, 0.0f, 0.0f);

	// cone-tracing using an isosceles triangle to approximate a cone in screen space
	for(int i = 0; i < 7; ++i)
	{
		// intersection length is the adjacent side, get the opposite side using trig
		float oppositeLength = isoscelesTriangleOpposite(adjacentLength, coneTheta);

		// calculate in-radius of the isosceles triangle
		float incircleSize = isoscelesTriangleInRadius(adjacentLength, oppositeLength);

		// get the sample position in screen space
		float2 samplePos = pIn.tex.xy + adjacentUnit * (adjacentLength - incircleSize);

		// convert the in-radius into screen size then check what power N to raise 2 to reach it - that power N becomes mip level to sample from
		float mipChannel = log2(incircleSize * max(cb_screenSize.x, cb_screenSize.y)); // try this with min intead of max

		/*
		 * Read color and accumulate it using trilinear filtering and weight it.
		 * Uses pre-convolved image (color buffer), pre-integrated transparency (visibility buffer),
		 * and hi-z buffer (hiZBuffer).
		 * Checks if cone sphere is below, between, or above the hi-z minimum and maximum and weights
		 * it together with transparency (visibility).
		 * Visibility is accumulated in the alpha channel.  Break if visibility is 100% or greater (>= 1.0f).
		 */
		totalColor += coneSampleWeightedColor(samplePos, mipChannel);
		
		if(totalColor.a >= 1.0f)
		{
			break;
		}

		adjacentLength = isoscelesTriangleNextAdjacent(adjacentLength, incircleSize);
	}



	////////////
	// fake implementation while testing - overwrites entire cone tracing loop - equivalent of cone angle being 0.0f
	
	totalColor.rgb = colorBuffer.SampleLevel(sampPointClamp, raySS.xy, 0.0f).rgb;
	
	// end fake
	////////////

	float3 toEye = -toPositionVS;
	// test this with saturate instead of abs, too - see which gives best result
	float3 specular = calculateFresnelTerm(specularAll.rgb, abs(dot(normalVS, toEye))) * RB_1DIVPI;

	return float4(totalColor.rgb * specular, 1.0f);
}

Screenshots:

(EDIT: screenshots didn't show up so linking to Dropbox images instead)

 

https://www.dropbox.com/s/1852z89kuj7hnn4/screenshot_0.png

https://www.dropbox.com/s/rx8w8da2qazg112/screenshot_1.png

https://www.dropbox.com/s/f3z4sxf0cjfz29r/screenshot_2.png

https://www.dropbox.com/s/i8k4nuw25byx4jv/screenshot_3.png




PARTNERS