Jump to content
  • Advertisement
Sign in to follow this  
Darg

DX11 [HLSL] Ray-Tracing on GPU

This topic is 3031 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey guys, for my dissertation I'm working on ray-traced shadows in a deferred renderer. I've got pretty much everything I need done for it but I've got quite a major problem, performance.

(Before the block of text starts if you just want to look through the shader code and see what optimisation tips you can give me then just skip all this and go to the last paragraph :) )

For those of you that haven't done ray-tracing before I'll quickly explain a couple of things that have killed the performance on me.

I'm using a uniform grid with DDA traversal. DDA traversal is the technique used to draw a line in paint programs, it basically finds all the nodes in the uniform grid that the ray goes through. A good paper on it can be found here:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.3443&rep=rep1&type=pdf

Without DDA traversal only the node where the ray starts will be tested. I was doing this for quite a while to make sure that my performance stayed high enough for debugging.

Another thing that's killed performance is making sure that all the nodes an object covers contains a reference to the object. So imagine if you have an object at the corner of 4 nodes. Originally only the node that the objects origin was in would have a reference. I've changed that now so that all 4 nodes (or more with very large objects) have a reference to the object, this can lead to the same object being tested 3 times by a single ray. Not good, a hierarchical grid will fix this by bumping an object that covers more then one node up a layer to a grid with larger nodes.

Anyway without DDA and multiple object references I was getting pretty decent performance (30~fps) even with mesh detection on dozens of objects with 1500 triangles each. Now that I've put in DDA and the multiple object references the performance has dropped right down to 1-5fps. I've got a shadow quality setting that can drop the size of the G-buffer so less shadow rays are shot even with this however the performance is far too low.

I don't like having to do this but I've decided that all I can do at this point is put the shader code up here and hope that people with more hlsl experience will be able to point out some places where I can optimise it a bit. It's dx9 in XNA with a custom effect compiler that allows me to use the latest dx sdk for dynamic branching. When I compile this code with everything it takes about 15-20 minutes to compile. Without the mesh intersection it takes about a minute to compile. This is a pretty good sign to me that it can be optimised a lot more then I have. I tried compiling in dx11 and it just shot through in a second or two which is annoying! If only XNA could use dx11.

Anyway here is the code, anyone who wants to perform their good deed for the day please have a look over it and give me what tips you can:
(code put into next post, think I reached the text limit of a post)

Share this post


Link to post
Share on other sites
Advertisement
Hrm, I don't know why but whenever I try to post the code it just crashes gamedev on me. I can't access it until I close my browser and restart it.

For now, until anyone can help me with that the source code is up here:

http://scgamedev.tumblr.com/sourcecode

I know it's a terrible way to look at it without any highlighting, if anyone knows a site where I can put it up without the size limit then let me know and I'll put it up there and post a link.

Share this post


Link to post
Share on other sites
I hate to spam but I thought I'd post the main parts of the code in separate posts, here are the helper functions, pretty basic stuff mostly transforming a linear reference to it's uv position in the corresponding texture:


float2 worldSpaceToVoxel(float3 rayStart)
{
return float2((int)(rayStart.x / nodeSize), (int)(rayStart.z / nodeSize));
}

float2 voxelToWorldSpace(float2 voxel)
{
return float2(voxel.x * nodeSize, voxel.y * nodeSize);
}

float2 linearToUV_ID(float id)
{
float2 uv;
uv.y = (id * 4.0f) * tx_objects_pixelsizes.x;
uv.x = uv.y % 1.0f;
uv.y = (uv.y - uv.x) * tx_objects_pixelsizes.y;
return uv;
}

float2 linearToUV_Model(float i)
{
float2 uv;
uv.y = i * tx_modelinfo_pixelsizes.x;
uv.x = uv.y % 1.0f;
uv.y = (uv.y - uv.x) * tx_modelinfo_pixelsizes.y;
return uv;
}

float2 linearToUV_Index(float i)
{
float2 uv;
uv.y = i * tx_modelindices_pixelsizes.x;
uv.x = uv.y % 1.0f;
uv.y = (uv.y - uv.x) * tx_modelindices_pixelsizes.y;
return uv;
}

float2 linearToUV_Vertice(float i)
{
float2 uv;
uv.y = i * tx_modelvertices_pixelsizes.x;
uv.x = uv.y % 1.0f;
uv.y = (uv.y - uv.x) * tx_modelvertices_pixelsizes.y;
return uv;
}

Share this post


Link to post
Share on other sites
These are the mesh (triangle) and sphere intersection tests:


bool checkModel(float2 uv, float3 rayStart, float3 rayDir)
{
float2 modelInfo = tex2Dlod(ModelInfoSampler, float4(uv, 0, 0)).xy;
[loop] for(int i = modelInfo.x; i < modelInfo.y; i++)
{
//TRIANGLE INTERSECTION TEST
float3 tri = tex2Dlod(ModelIndicesSampler, float4(linearToUV_Index(i), 0, 0)).xyz;
float3 v1 = tex2Dlod(ModelVerticesSampler, float4(linearToUV_Vertice(tri.x), 0, 0)).xyz;
float3 v2 = tex2Dlod(ModelVerticesSampler, float4(linearToUV_Vertice(tri.y), 0, 0)).xyz;
float3 v3 = tex2Dlod(ModelVerticesSampler, float4(linearToUV_Vertice(tri.z), 0, 0)).xyz;

float4 Nr = float4(normalize(cross(v3 - v1, v2 - v1)), 0);
Nr.w = dot(Nr.xyz, rayDir);

[branch] if (Nr.w != 0)
{
Nr.w = dot(Nr.xyz, v1 - rayStart) / Nr.w;
[branch] if(Nr.w > EPSILON)
{
float3 p = rayStart + (rayDir * Nr.w);
float3 u = v2 - v1;
float3 v = v3 - v1;
float3 w = p - v1;
float4 UV_WV_VV_WU = float4(dot(u, v), dot(w, v), dot(v, v), dot(w, u));
float4 UU_den_s_t = float4(dot(u, u), 0, 0, 0);
UU_den_s_t.y = (UV_WV_VV_WU.x * UV_WV_VV_WU.x) - (UU_den_s_t.x * UV_WV_VV_WU.z);
UU_den_s_t.z = ((UV_WV_VV_WU.x * UV_WV_VV_WU.y) - (UV_WV_VV_WU.z * UV_WV_VV_WU.w)) / UU_den_s_t.y;
UU_den_s_t.w = ((UV_WV_VV_WU.x * UV_WV_VV_WU.w) - (UU_den_s_t.x * UV_WV_VV_WU.y)) / UU_den_s_t.y;

[branch] if (UU_den_s_t.z >= 0 && UU_den_s_t.w >= 0 && UU_den_s_t.z + UU_den_s_t.w <= 1)
{
return true;
}
}
}
}
return false;
}

bool checkSphere(float4 boundingSphere, float3 rayStart, float3 rayDir)
{
float3 dst = boundingSphere.xyz - rayStart.xyz;
float3 BCD = float3(dot(dst, rayDir), 0, 0);
[branch] if(BCD.x < 0.0f && distance(boundingSphere.xyz, rayStart.xyz) > (boundingSphere.w - 0.1f))
{
//RAY IS POINTING AWAY FROM THE SPHERE AND STARTS OUTSIDE OF THE SPHERE
return false;
}
BCD.y = dot(dst, dst) - boundingSphere.w * boundingSphere.w;
BCD.z = BCD.x * BCD.x - BCD.y;
return (BCD.z > 0.0f);
}

Share this post


Link to post
Share on other sites
These are the object and node checks:


bool checkObject(int id, float3 rayStart, float3 rayDir)
{
//GET THE POSITION IN THE OBJECT TEXTURE GIVEN THE ID
float2 uv = linearToUV_ID(id);

//RETRIEVE INVERSE WORLD MATRIX WITH PACKED MODEL UV AND PRIMITIVE TYPE
float4x4 invWorld;
invWorld[0] = tex2Dlod(ObjectSampler, float4(uv, 0, 0));
invWorld[1] = tex2Dlod(ObjectSampler, float4(uv.x + tx_objects_pixelsizes.x, uv.y, 0, 0));
invWorld[2] = tex2Dlod(ObjectSampler, float4(uv.x + 2.0f * tx_objects_pixelsizes.x, uv.y, 0, 0));
invWorld[3] = tex2Dlod(ObjectSampler, float4(uv.x + 3.0f * tx_objects_pixelsizes.x, uv.y, 0, 0));

//GET BOUNDING SPHERE
float2 model_uv = linearToUV_Model(invWorld._14);
float4 boundingSphere = tex2Dlod(BoundingSphereSampler, float4(model_uv, 0, 0));
invWorld._14 = 0;

//GET PRIMITIVE TYPE
int primType = invWorld._24;
invWorld._24 = 0;

//TRANSFORM RAY TO OBJECT SPACE
float4 t_rayStart = mul(float4(rayStart, 1), invWorld);
float3 t_rayDir = normalize(mul(rayDir, (float3x3)invWorld));

//CHECK SPHERE INTERSECTION
bool intersected = checkSphere(boundingSphere, t_rayStart.xyz, t_rayDir);

[branch] if(primType == MESH)
{
//IF THE PRIMITIVE IS A MESH THEN TEST THE MODEL
return checkModel(model_uv, t_rayStart.xyz, t_rayDir);
}
return intersected;
}

bool checkNode(float2 node, float3 rayStart, float3 rayDir, float t, float t2)
{
bool intersected = false;

//GET THE MIN AND MAX HEIGHTS OF THE NODE
float2 heights = tex2Dlod(GridHeightsSampler, float4(node / nodeAmount, 0, 0)).rg;
//GET THE MIN AND MAX HEIGHTS OF THE RAY IN THIS NODE
float2 thisHeights = float2(rayStart.y + rayDir.y * t, rayStart.y + rayDir.y * t2);
[branch] if(min(thisHeights.x, thisHeights.y) > heights.y)
{
//IF THE RAY IS ABOVE THE MAX HEIGHT OF THE NODE (MIN NOT TESTED AT THE MOMENT)
return false;
}

//SET UP VARIABLES FOR THE LOOP THROUGH THE OBJECTS IN THIS NODE
//EACH NODE IS A SQUARE OF OBJECT REFERENCES, LIMIT IS THE SIZE OF THE SQUARE
float limit = nodeObjAmount * nodeObjSize;
//STARTING UV POINT OF THE NODE IN TEXTURE SPACE
float4 uv = float4(node.x * nodeSizeTspace, node.y * nodeSizeTspace, 0, 0);
//GET THE ID IN THE FIRST POSITION OF THE NODE
float id = tex2Dlod(GridSampler, float4(uv.xy, 0, 0)).r;

//START THE LOOP TO GO THROUGH THE REST OF THE OBJECTS UNTIL AN INTERSECTION OR ID OF 0 IS FOUND
[loop] while(uv.w < limit && intersected == false && id != 0.0f)
{
//CHECK INTERSECTION WITH THIS OBJECT
intersected = checkObject(id, rayStart, rayDir);

//MOVE ONTO NEXT OBJECT IN THE NODE
uv.z += nodeObjSize;
if(uv.z >= limit)
{
uv.z = 0.0f;
uv.w += nodeObjSize;
}
id = tex2Dlod(GridSampler, float4(uv.xy + uv.zw, 0, 0)).r;
}
return intersected;
}


Share this post


Link to post
Share on other sites
It doesn't let me post the full length of this method so this TraceRay method which contains the DDA traversal is split into two parts:


bool traceRay(float3 rayStart, float3 rayDir)
{
//GET THE START NODE OF THE RAY
float2 currNode = worldSpaceToVoxel(rayStart);
bool intersected = false;
//CHECK INTERSECTION WITH THE START NODE
intersected = checkNode(currNode, rayStart, rayDir, 0.0f, 0.0f);

//IF THE INTERSECTION IS FALSE THEN MOVE ONTO THE NEXT NODE USING DDA TRAVERSAL
[branch] if(intersected == false)
{
//DDA VARIABLES, T AND T2 HOLD THE DISTANCES WHERE THE RAY ENTERS AND EXITS THE CURRENT NODE
float stepX, stepY, tMaxX, tMaxY, tDeltaX, tDeltaY, t, t2;
if(rayDir.x > 0.0f)
{
stepX = 1.0f;
tMaxX = (currNode.x + 1 - (rayStart.x / nodeSize)) / rayDir.x;
tDeltaX = 1.0f / rayDir.x;
}
else
{
if(rayDir.x < 0.0f)
{
stepX = -1.0f;
tMaxX = ((rayStart.x / nodeSize) - currNode.x) / rayDir.x;
tDeltaX = 1.0f / rayDir.x;
}
else
{
stepX = 0.0f;
tMaxX = 0.0f;
tDeltaX = 0.0f;
}
}
if(rayDir.z > 0.0f)
{
stepY = 1.0f;
tMaxY = (currNode.y + 1 - (rayStart.z / nodeSize)) / rayDir.z;
tDeltaY = 1.0f / rayDir.z;
}
else
{
if(rayDir.z < 0.0f)
{
stepY = -1.0f;
tMaxY = ((rayStart.z / nodeSize) - currNode.y) / rayDir.z;
tDeltaY = 1.0f / rayDir.z;
}
else
{
stepY = 0.0f;
tMaxY = 0.0f;
tDeltaY = 0.0f;
}
}


Share this post


Link to post
Share on other sites

//ENSURE THE DDA VARIABLES ARE ALL POSITIVE
tMaxX = abs(tMaxX);
tMaxY = abs(tMaxY);
tDeltaX = abs(tDeltaX);
tDeltaY = abs(tDeltaY);

//DDA LOOP EXITS WHEN THE RAY GOES OUTSIDE THE MAP OR FINDS AN INTERSECTION
[loop] while(currNode.x >= 0 && currNode.y >= 0 && currNode.x < nodeAmount && currNode.y < nodeAmount && intersected == false)
{
//DDA LOOP
if(tMaxX < tMaxY)
{
tMaxX += tDeltaX;
t += tDeltaX;
currNode.x += stepX;
}
else
{
tMaxY += tDeltaY;
t += tDeltaY;
currNode.y += stepY;
}

//FIND THE POINT WHERE THE RAY EXITS THE NODE
if(tMaxX < tMaxY)
{
t2 = t + tDeltaX;
}
else
{
t2 = t + tDeltaY;
}

//CHECK NODE INTERSECTION
intersected = checkNode(currNode, rayStart, rayDir, t, t2);
}
}
return intersected;
}

Share this post


Link to post
Share on other sites
And finally the pixel shader start point which gets the ray start and ray direction:


float4 RayMask_PS (PS_INPUT IN) : COLOR0
{
float depth = tex2D(DepthSampler, IN.TexCoords).r;
if(depth == 0) { discard; }
float3 pos3D = CamPos + depth * normalize(IN.BackRay);
float3 rayDir = float3(0, 0, 0);

if(LightType == POINT_LIGHT || LightType == SPOT_LIGHT)
{
rayDir = LightPos - pos3D;
if(length(rayDir) > LightRange)
{
discard;
}
rayDir = normalize(rayDir);
}
else
{
if(LightType == DIR_LIGHT)
{
rayDir = LightPos;
}
}

//START THE RAY TRAVERSAL
bool intersected = traceRay(pos3D, rayDir);

float shadow = 1.0f;
if(intersected == true)
{
shadow = 0.0f;
}
return float4(shadow, 0, 0, 1);
}



Thankyou for looking through it!

Share this post


Link to post
Share on other sites
Hi,
Just a quick look at your shader reveals that there is plenty of room for optimization. For e.g instead of cascading the if else loops for determining the step direction u could use the hlsl sign function (http://msdn.microsoft.com/en-us/library/bb509649(v=VS.85).aspx) like so (this is valid in glsl i dont know if HLSL has some similar type).


int3 step = sign(eyeRay.dir);




For ur (and others) help, i am attaching a complete glsl uniform grid volume raycaster in a GLSL fragment shader that i implemented based on the paper link u gave. U can convert it comfortably to HLSL.

Hope this helps.

Regards,
Mobeen


//Assumptions
//DIMS -> dim of your dataset

//initialization
vec3 u = eyeRay.o + t*eyeRay.d;
vec3 uVS = (u*0.5+0.5); //from (-1 1) to (0 1)
vec3 uVS2 = uVS* DIMS; //convert ot voxel space coords


//X,Y,Z are initialized to the starting voxel coords
int X = uVS2.x;
int Y = uVS2.y;
int Z = uVS2.z;
int3 step= sign(eyeRay.d);

//determine the values of tMaxX, tMaxY and tMaxZ
vec3 tMax=vec3(tnear+ 1.0/DIMS);
float t2 = mini(tMax);

//disard the current fragment
if(t2<tnear || t2 > tfar)
discard;

vec3 tDelta = abs(t2/eyeRay.d) ;
bool inside = true;
for(int i=0;i<maxSteps;i++) {
if(! inside)
break;
if(tMax.x < tMax.y) {
if(tMax.x < tMax.z) {
X= X + step.x;
if(X >= DIMS.x)
inside = false; /* outside grid */
tMax.x = tMax.x + tDelta.x;
} else {
Z= Z + step.z;
if(Z >= DIMS.y) /* outside grid */
inside = false;
tMax.z= tMax.z + tDelta.z;
}
} else {
if(tMax.y < tMax.z) {
Y= Y + step.y;
if(Y >= DIMS.y) /* outside grid */
inside = false;
tMax.y= tMax.y + tDelta.y;
} else {
Z= Z + step.z;
if(Z >= DIMS.z) /* outside grid */
inside = false;
tMax.z= tMax.z + tDelta.z;
}
}

//Do your stuff here like compositing/triangle intersection etc.
vec3 pos = vec3(X,Y,Z)/DIMS;
half sample = texture3D(data, pos);

//This is just a smple iso-surface ray caster
if ( sample>0.1 && (sample -isoValue) < 0 )
{
vec3 N = GetGradient(pos);
vec3 V = -eyeRay.d;
vec3 L = V;
fragColor = PhongLighting(L,N,V,250, pos);
inside=false;
break;
}
}




and here is the rendered output

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!