Help with GPU Pro 5 Hi-Z Screen Space Reflections

Started by
77 comments, last by WFP 9 years, 2 months ago

Hi there! I'm trying to implement chapter 4 of the Lighting and Shading section in GPU Pro 5. Basically, how to optimize my screen space reflections using a mip-mapped Z buffer to quickly converge on the intersection point of my reflection ray.

Sadly, the author wasn't allowed to release the code/demo he talks about in the article, so I've had to work out most of the shader myself. I'm close, but one thing I don't get - if you are always starting at a lower mip in the HiZ buffer, won't your starting ray depth often (always?) be _behind_ (greater Z) than what you read from HiZ buffer? because you make the HiZ buffer taking the min(...) of the more detailed mip.

If you've implemented that chapter, or even just read and understood it, or using HiZ tracing before, I'd be curious to hear your thoughts.

thx!

Advertisement

Hi Bruzer100,

I was also preparing to start implementing something similar, and was disappointed to find that the code was unavailable, especially since several places in the chapter specifically tell the reader to consult the source. I'm sure when I do start my implementation in the next few days, I'll probably run into similar issues like the ones you've come across, so if you wouldn't mind sharing what hurdles you've had to work around that weren't called out, or even want to share your implementation, it would be highly appreciated. I'll bookmark this thread so that when I do start my implementation I can add anything I find to be valuable, especially if it was left out of the book's chapter.

Thanks,

WFP

Hi guys,

Same boat as you. Are you doing the ray marching in screen or view space?

On page 174, it's not clear to me how the function intersectDepthPlane should work:

float3 o = intersectDepthPlane(p.xy, d.xy, -p.z);

Is this a type-o or am I missing something? Should it be 'p.xyz' since it should be re-projecting the point onto the near plane (o.z = 0)? The method can't assume the point returned is always at z=0 since it's also used to calculate the tmpRay position during the ray march (which needs to keep track of the .z component)

I would expect the method to look like this: (?)

float3 intersectDepthPlane(float3 p, float2 d, float z)

{

return p + float3(d, 1) * z;

}

Not sure how intersectCellBoundary works either with the crossStep and crossOffset... (why need these two helper variables? Why saturate the cross direction?)

Cheers!

Jp

Hi Jp,

Bruzer and I have spoken a few times since this topic started originally and he has been great in helping me almost figure this thing out. For his sanity and for the good of the larger audience, it's probably best for us to bring the conversation back to this thread, though, so I'll post below what I've worked out so far with his help. Also, the chapter in GPU Pro 1 by Michal Drobot on Quadtree Displacement Mapping is a big help in understanding this, and is what the author of this article based his ray-tracing steps on. I still have some very major issues in my implementation (screenshots below), so I'm hoping that anyone reading over this may be able to help out and call me out on things I've done in a bone-headed way.

You'll notice in my implementation that some of the method arguments are a little different from what's in the book. For example, I pass the full float3 vectors to intersectDepthPlane and some other methods.

Also, I've done some preliminary testing on doing a small (8 or so iterations) linear ray march before doing the hi-z traversal in order to reduce artifacts of immediate intersections and found that it did help, but due to the current state of my shader, I pulled those back out until the basic stuff was working.

I hope this helps, and again, please call out any blatant errors you see in my current implementation attempt, as they clearly exist.

This is the pixel shader in its current state. Notice that currently I'm still trying to get the ray-tracing through the hi-z buffer part working, so I'm overwriting the cone-tracing output to be the equivalent to a cone angle of 0 (i.e., a perfectly smooth/mirror surface).


#include "HiZSSRConstantBuffer.hlsli"
#include "../../LightingModel/PBL/LightUtils.hlsli"
#include "../../ConstantBuffers/PerFrame.hlsli"
#include "../../ShaderConstants.hlsli"

struct VertexOut
{
	float4 posH : SV_POSITION;
	float3 viewRay : VIEWRAY;
	float2 tex : TEXCOORD;
};

SamplerState sampPointClamp : register(s0); // point sampling, clamped borders
SamplerState sampTrilinearClamp : register(s1); // trilinear sampling, clamped borders

Texture2D hiZBuffer : register(t0); // hi-z buffer - all mip levels
Texture2D visibilityBuffer : register(t1); // visibility buffer - all mip levels
Texture2D colorBuffer : register(t2); // convolved color buffer - all mip levels
Texture2D normalBuffer : register(t3); // normal buffer - from g-buffer
Texture2D specularBuffer : register(t4); // specular buffer - from g-buffer (rgb = ior, a = roughness)

static const float HIZ_START_LEVEL = 2.0f;
static const float HIZ_STOP_LEVEL = 2.0f;
static const float HIZ_MAX_LEVEL = float(cb_mipCount);
static const float2 HIZ_CROSS_EPSILON = float2(texelWidth, texelHeight); // maybe need to be smaller or larger? this is mip level 0 texel size
static const uint MAX_ITERATIONS = 64u;

float linearizeDepth(float depth)
{
	return projectionB / (depth - projectionA);
}

///////////////////////////////////////////////////////////////////////////////////////
// Hi-Z ray tracing methods
///////////////////////////////////////////////////////////////////////////////////////

static const float2 hiZSize = cb_screenSize; // not sure if correct - this is mip level 0 size

float3 intersectDepthPlane(float3 o, float3 d, float t)
{
	return o + d * t;
}

float2 getCell(float2 ray, float2 cellCount)
{
	// does this need to be floor, or does it need fractional part - i think cells are meant to be whole pixel values (integer values) but not sure
	return floor(ray * cellCount);
}

float3 intersectCellBoundary(float3 o, float3 d, float2 cellIndex, float2 cellCount, float2 crossStep, float2 crossOffset)
{
	float2 index = cellIndex + crossStep;
	index /= cellCount;
	index += crossOffset;
	float2 delta = index - o.xy;
	delta /= d.xy;
	float t = min(delta.x, delta.y);
	return intersectDepthPlane(o, d, t);
}

float getMinimumDepthPlane(float2 ray, float level, float rootLevel)
{
	// not sure why we need rootLevel for this
	return hiZBuffer.SampleLevel(sampPointClamp, ray.xy, level).r;
}

float2 getCellCount(float level, float rootLevel)
{
	// not sure why we need rootLevel for this
	float2 div = level == 0.0f ? 1.0f : exp2(level);
	return cb_screenSize / div;
}

bool crossedCellBoundary(float2 cellIdxOne, float2 cellIdxTwo)
{
	return cellIdxOne.x != cellIdxTwo.x || cellIdxOne.y != cellIdxTwo.y;
}

float3 hiZTrace(float3 p, float3 v)
{
	const float rootLevel = float(cb_mipCount) - 1.0f; // convert to 0-based indexing
	
	float level = HIZ_START_LEVEL;

	uint iterations = 0u;

	// get the cell cross direction and a small offset to enter the next cell when doing cell crossing
	float2 crossStep = float2(v.x >= 0.0f ? 1.0f : -1.0f, v.y >= 0.0f ? 1.0f : -1.0f);
	float2 crossOffset = float2(crossStep.xy * HIZ_CROSS_EPSILON.xy);
	crossStep.xy = saturate(crossStep.xy);

	// set current ray to original screen coordinate and depth
	float3 ray = p.xyz;

	// scale vector such that z is 1.0f (maximum depth)
	float3 d = v.xyz / v.z;

	// set starting point to the point where z equals 0.0f (minimum depth)
	float3 o = intersectDepthPlane(p, d, -p.z);

	// cross to next cell to avoid immediate self-intersection
	float2 rayCell = getCell(ray.xy, hiZSize.xy);
	ray = intersectCellBoundary(o, d, rayCell.xy, hiZSize.xy, crossStep.xy, crossOffset.xy);

	while(level >= HIZ_STOP_LEVEL && iterations < MAX_ITERATIONS)
	{
		// get the minimum depth plane in which the current ray resides
		float minZ = getMinimumDepthPlane(ray.xy, level, rootLevel);
		
		// get the cell number of the current ray
		const float2 cellCount = getCellCount(level, rootLevel);
		const float2 oldCellIdx = getCell(ray.xy, cellCount);

		// intersect only if ray depth is below the minimum depth plane
		float3 tmpRay = intersectDepthPlane(o, d, max(ray.z, minZ));

		// get the new cell number as well
		const float2 newCellIdx = getCell(tmpRay.xy, cellCount);

		// if the new cell number is different from the old cell number, a cell was crossed
		if(crossedCellBoundary(oldCellIdx, newCellIdx))
		{
			// intersect the boundary of that cell instead, and go up a level for taking a larger step next iteration
			tmpRay = intersectCellBoundary(o, d, oldCellIdx, cellCount.xy, crossStep.xy, crossOffset.xy); //// NOTE added .xy to o and d arguments
			level = min(HIZ_MAX_LEVEL, level + 2.0f);
		}

		ray.xyz = tmpRay.xyz;

		// go down a level in the hi-z buffer
		--level;

		++iterations;
	}

	return ray;
}

///////////////////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////////////////////////
// Hi-Z cone tracing methods
///////////////////////////////////////////////////////////////////////////////////////

float specularPowerToConeAngle(float specularPower)
{
	// based on phong reflection model
	const float xi = 0.244f;
	float exponent = 1.0f / (specularPower + 1.0f);
	/*
	 * may need to try clamping very high exponents to 0.0f, test out on mirror surfaces first to gauge
	 * return specularPower >= 8192 ? 0.0f : cos(pow(xi, exponent));
	 */
	return cos(pow(xi, exponent));
}

float isoscelesTriangleOpposite(float adjacentLength, float coneTheta)
{
	// simple trig and algebra - soh, cah, toa - tan(theta) = opp/adj, opp = tan(theta) * adj, then multiply * 2.0f for isosceles triangle base
	return 2.0f * tan(coneTheta) * adjacentLength;
}

float isoscelesTriangleInRadius(float a, float h)
{
	float a2 = a * a;
	float fh2 = 4.0f * h * h;
	return (a * (sqrt(a2 + fh2) - a)) / (4.0f * max(h, 0.00001f));
}

float4 coneSampleWeightedColor(float2 samplePos, float mipChannel)
{
	// placeholder - this is just to get something on screen
	float3 sampleColor = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).rgb;
	float visibility = visibilityBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).r;

	return float4(sampleColor * visibility, visibility);
}

float isoscelesTriangleNextAdjacent(float adjacentLength, float incircleRadius)
{
	// subtract the diameter of the incircle to get the adjacent side of the next level on the cone
	return adjacentLength - (incircleRadius * 2.0f);
}

///////////////////////////////////////////////////////////////////////////////////////

float4 main(VertexOut pIn) : SV_TARGET
{
	/*
	 * Ray(t) = O + D> * t
	 * D> = V>SS / V>SSz
	 * O = PSS + D> * -PSSz
	 * V>SS = P'SS - PSS
	 * PSS = {texcoord.x, texcoord.y, depth} // screen/texture coordinate and depth
	 * PCS = (PVS + reflect(V>VS, N>VS)) * MPROJ
	 * P'SS = (PCS / PCSw) * [0.5f, -0.5f] + [0.5f, 0.5f]
	 */
	int3 loadIndices = int3(pIn.posH.xy, 0);
	float depth = hiZBuffer.Load(loadIndices).r;
	// PSS
	float3 positionSS = float3(pIn.tex, depth);
	float linearDepth = linearizeDepth(depth);
	// PVS
	float3 positionVS = pIn.viewRay * linearDepth;

	// V>VS - since calculations are in view-space, we can just normalize the position to point at it
	float3 toPositionVS = normalize(positionVS);
	// N>VS
	float3 normalVS = normalBuffer.Load(loadIndices).rgb;
	if(dot(normalVS, float3(1.0f, 1.0f, 1.0f)) == 0.0f)
	{
		return float4(0.0f, 0.0f, 0.0f, 0.0f);
	}
	
	float3 reflectVS = reflect(toPositionVS, normalVS);
	float4 positionPrimeSS4 = mul(float4(positionVS + reflectVS, 1.0f), projectionMatrix);
	float3 positionPrimeSS = (positionPrimeSS4.xyz / positionPrimeSS4.w);
	positionPrimeSS.x = positionPrimeSS.x * 0.5f + 0.5f;
	positionPrimeSS.y = positionPrimeSS.y * -0.5f + 0.5f;

	// V>SS - screen space reflection vector
	float3 reflectSS = positionPrimeSS - positionSS;

	// calculate the ray
	float3 raySS = hiZTrace(positionSS, reflectSS);

	// perform cone-tracing steps

	// get specular power from roughness
	float4 specularAll = specularBuffer.Load(loadIndices);
	float specularPower = roughnessToSpecularPower(specularAll.a);

	// convert to cone angle (maximum extent of the specular lobe aperture
	float coneTheta = specularPowerToConeAngle(specularPower);

	// P1 = positionSS, P2 = raySS, adjacent length = ||P2 - P1||
	
	// need to check if this is correct calculation or not
	float2 deltaP = raySS.xy - positionSS.xy;
	float adjacentLength = length(deltaP);
	
	// need to check if this is correct calculation or not
	float2 adjacentUnit = normalize(deltaP);

	float4 totalColor = float4(0.0f, 0.0f, 0.0f, 0.0f);

	// cone-tracing using an isosceles triangle to approximate a cone in screen space
	for(int i = 0; i < 7; ++i)
	{
		// intersection length is the adjacent side, get the opposite side using trig
		float oppositeLength = isoscelesTriangleOpposite(adjacentLength, coneTheta);

		// calculate in-radius of the isosceles triangle
		float incircleSize = isoscelesTriangleInRadius(adjacentLength, oppositeLength);

		// get the sample position in screen space
		float2 samplePos = pIn.tex.xy + adjacentUnit * (adjacentLength - incircleSize);

		// convert the in-radius into screen size then check what power N to raise 2 to reach it - that power N becomes mip level to sample from
		float mipChannel = log2(incircleSize * max(cb_screenSize.x, cb_screenSize.y)); // try this with min intead of max

		/*
		 * Read color and accumulate it using trilinear filtering and weight it.
		 * Uses pre-convolved image (color buffer), pre-integrated transparency (visibility buffer),
		 * and hi-z buffer (hiZBuffer).
		 * Checks if cone sphere is below, between, or above the hi-z minimum and maximum and weights
		 * it together with transparency (visibility).
		 * Visibility is accumulated in the alpha channel.  Break if visibility is 100% or greater (>= 1.0f).
		 */
		totalColor += coneSampleWeightedColor(samplePos, mipChannel);
		
		if(totalColor.a >= 1.0f)
		{
			break;
		}

		adjacentLength = isoscelesTriangleNextAdjacent(adjacentLength, incircleSize);
	}



	////////////
	// fake implementation while testing - overwrites entire cone tracing loop - equivalent of cone angle being 0.0f
	
	totalColor.rgb = colorBuffer.SampleLevel(sampPointClamp, raySS.xy, 0.0f).rgb;
	
	// end fake
	////////////

	float3 toEye = -toPositionVS;
	// test this with saturate instead of abs, too - see which gives best result
	float3 specular = calculateFresnelTerm(specularAll.rgb, abs(dot(normalVS, toEye))) * RB_1DIVPI;

	return float4(totalColor.rgb * specular, 1.0f);
}

Screenshots:

(EDIT: screenshots didn't show up so linking to Dropbox images instead)

https://www.dropbox.com/s/1852z89kuj7hnn4/screenshot_0.png

https://www.dropbox.com/s/rx8w8da2qazg112/screenshot_1.png

https://www.dropbox.com/s/f3z4sxf0cjfz29r/screenshot_2.png

https://www.dropbox.com/s/i8k4nuw25byx4jv/screenshot_3.png

Hi WFP,

Thanks that is fantastic. My favorite comment is "// not sure why we need rootLevel for this" (since I have the exact same comment in my own code) :D

On my side, I've finished writing all of the subroutines missing from the chapter. I've written them in mel first to see if they worked in Maya. Will port them to hlsl this pm to test with the shader. I'm also focusing on the hi-z raymarching as a starting point. I'm hoping the cone tracing passes will be more simple. Question, do you know what the author means by "The final demo uses minimum-maximum tracing which is a bi more complicated"? I'm not sure what he means by "maximum" tracing... When would we ever need the maximum depth value of a cell? Since we can only go so far as the cell's boundary for a march anyways. I'm scratching my head over this one : ) I'm only storing the minimum z value for now.

Here's the mel procedures I've written so far (seemed to be able to get the valid cell intersections in Maya...

Thanks again and will keep in touch!

Jp

proc float[] intersectDepthPlane(float $p[], float $d[], float $t)

{
float $x = $p[0] + $d[0] * $t;
float $y = $p[1] + $d[1] * $t;
return {$x, $y};
}
proc float[] getCell(float $pos[], float $cellCount[])
{
float $cellX = clamp(0, $cellCount[0] - 0.0001, $pos[0] * $cellCount[0]);
float $cellY = clamp(0, $cellCount[1] - 0.0001, $pos[1] * $cellCount[1]);
return {floor($cellX), floor($cellY)};
}
proc float[] intersectCellBoundary(float $pos[], float $dir[], float $cellId[], float $cellCount[], float $crossStep[], float $crossOffset[])
{
float $cellWidth = 1.0/$cellCount[0];
float $cellHeight = 1.0/$cellCount[1];
float $xPlane = $cellId[0]/($cellCount[0]) + $cellWidth * $crossStep[0];
float $yPlane = $cellId[1]/($cellCount[1]) + $cellHeight* $crossStep[1];
float $tx = ($xPlane - $pos[0])/$dir[0];
float $ty = ($yPlane - $pos[1])/$dir[1];
float $t = min($tx, $ty);
float $intersection[] = intersectDepthPlane($pos, $dir, $t);
return $intersection;
}
// Set the count info
float $cellCount[2] = {12,12};
// Get the origin info
float $ox = `getAttr o.translateX`;
float $oy = `getAttr o.translateY`;
float $dx = `getAttr d.translateX`;
float $dy = `getAttr d.translateY`;
// Get the direction info
float $ray[2] = {$dx, $dy};
float $d[2] = {$ray[0] - $ox, $ray[1] - $oy};
float $dl = sqrt($d[0] * $d[0] + $d[1] * $d[1]);
float $o[2] = {$ox, $oy};
float $d[2] = {$d[0]/$dl, $d[1]/$dl};
// Get the cross info
float $crossStep[2] = {1,1};
if($d[0] < 0)
$crossStep[0] = -1;
if($d[1] < 0)
$crossStep[1] = -1;
float $eps = 0.0001;
float $crossOffset[2] = {$crossStep[0] * $eps, $crossStep[1] * $eps};
$crossStep[0] = clamp(0, 1, $crossStep[0]);
$crossStep[1] = clamp(0, 1, $crossStep[1]);
float $cellId[2] = getCell($ray, $cellCount);
print($cellId[0] + ", " + $cellId[1] + "\n");
$ray = intersectCellBoundary($o, $d, $cellId, $cellCount, $crossStep, $crossOffset);
// Display
catchQuiet(`delete intersection_ray_curve`);
catchQuiet(`curve -d 3 -p $o[0] $o[1] 0 -p $ray[0] $ray[1] 0 -k 0 -k 0 -k 0 -k 0 -name "intersection_ray_curve"`);
parent -r intersection_ray_curve directX;
select o;

Also:

static const float2 hiZSize = cb_screenSize; // not sure if correct - this is mip level 0 size

I was also wondering about this. To me it would make sense that this should be the size of the mip_level we are starting the ray march from. Since it is used to do the first cell boundary test, but the name of the variable seems to imply otherwise :) Not sure either. Feels great to have other people to chat about this :D

Jp

Hey Jp,

Great to see you're making some good headway on this. I'm looking forward to seeing how your translation to HLSL works out.


I realize I forgot to answer a question of yours yesterday, but I think you figured out the answer anyway - I am doing the ray marching in screen space.


Regarding the min-max tracing, what the author means is that when you're creating your hi-z buffer, you save not only the minimum depth value [min(min(value.x, value.y), min(value.z, value.w))], you also store the maximum value as well [max(max(value.x, value.y), max(value.z, value.w))]. What this gives you is a better estimation of the depth of the object at the pixel you're currently processing. You can use this in the ray-tracing pass to walk behind an object - if the current ray depth (from O + D * t) is outside the range of [min, max] at that pixel, you know it's not intersecting and can continue marching along that ray without further processing at the current position. I do not have this in my implementation yet, as I'm just trying to get the basics working first.


That's a good idea you had concerning the hiZSize, and this evening when I get back home (mine is a hobby project at the moment, so I work on it in my free time), I will try setting it to something like hiZSize = cb_screenSize / exp2(HI_Z_START_LEVEL). One of my issues could very well be that I'm not taking a large enough step away from the starting point to begin with.


Glad to have another person to bounce ideas back and forth with!


-WFP


(Edit: formatting and grammar)

I tried using different cell sizes (mip levels) to offset the inital ray starting point. Even going 2 down wasn't enough. In the end, I ended up biasing much like shadow maps, to clean up the initial self intersections. It means the reflections aren't perfectly lined up, but you can't tell once there is a blur applied.

I think you are having issues not rejecting the "wrong" ray hits. This wasn't talked about in the article, or maybe I missed it, but the hiZ trace can return you a screen space position that is incorrect for the reflection. Think of a ray going behind an object floating above the ground. The Z position of the ray will be far in back of Z position of the object, but the hiZ trace will return you the screen space position of where the ray and object first intersect. In this case, you need to understand it's a nonsensical intersection, and keep tracing. This is also where the max part of the min/max buffer comes in handy, since you will now be "behind" the object, which the implementation in the book doesn't cover.

The author says he does handle all that in his implementation that he never shows us. :/ IMO it's kind of an incomplete article, and I'm disappointed the GPU Pro editors decided to include it knowing it was written against source code that couldn't be released. Seems a bit disingenuous.

Also, I _had_ to apply temporal super sampling and temporal filtering to my results to get the effect to shipping quality. The temporal stability of the technique is very poor - you'll see shimmering / aliasing on real-world scene with any depth complexity to it.

Hi,

I've spent just a moment this evening so far working some more on this, and it does seem that changing the hiZSize to what was mentioned above helps out some. It is now defined as


static const float2 hiZSize = cb_screenSize / exp2(HIZ_START_LEVEL); 

It still needs some refinement, but I feel confident that either a small linear search or a bias like Bruzer mentioned will help.

The main issues I'm facing now are these stair-step-like artifacts that show up. I've attached links to two images showing what I'm talking about.

Any idea what causes artifacts like this? Bruzer mentioned some stair-like artifacts he was seeing due to using a floor() command where he didn't need one, but I only have the one there to make sure the cell is an integer and removing that doesn't remove the artifacts.

I'm almost wondering if I'm not doing something in the "setup" code incorrectly - that is, the code leading up to the hiZTrace() call where I'm getting the position and ray direction. Maybe someone could give it a once-over to see if I've missed something there?

Thanks,

WFP

Dropbox image links:

https://www.dropbox.com/s/3rby7uwi2vugnw0/screenshot_4.png

https://www.dropbox.com/s/hs6ygesn1ez7sw4/screenshot_4_annotated.png

@WFP, looking good :D This seems much better! Yes, it does seem like a small linear search at the end (I guess 4 taps for the missing first 2 mip levels) will probably help. What is the resolution your rendering at? Power of 2? The hi-z pass I wrote this pm doesn't currently support buffers that aren't power of 2 (incorrectly gathers mins and maxs). If some cells have incorrect min/max info, you might have some ray misses. I didn't get very far with the ray march today though. Will continue tomorrow. If I'd have to bet, I would say your screen space pos and dir are corrrect. I don't think you'd get any good reflections that way.

@Bruzer, I sort of agree about the article. It really doesn't stand on it's feet without the code. For example, the article describes in great detail how to calculate a reflection vector in screenspace. Diagrams, of a reflected ray, description of what is a reflection, and then, when it gets to the actual reflection algo: "to understand the algorithm, look at the diagram on page blah". There's a whole lot of subtleties not captured neither by the pseudo code, the diagram, or the code snippet.

I also looked at the simple linear ray march mini program and wondered how that could handle the case where a ray goes "under" an object (the code as is would conclude a reflection is good as soon as the ray reaches anything in front of it)...

And yes about the temp filtering, I was anticipating the reflection pass to be unstable :( How many history buffers did you keep?

Well gentlemans, I'm very pleased I found this thread :)

Will post results as soon as I have anything.

Cheers,

Jp

This topic is closed to new replies.

Advertisement