Jump to content

  • Log In with Google      Sign In   
  • Create Account

tree traversal of LBVH with and without stack


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
No replies to this topic

#1 Wh0p   Members   -  Reputation: 340

Like
0Likes
Like

Posted 03 August 2014 - 05:37 AM

Hi,

I have implemented a tilebased and clustered deferred shading pipeline and am currently profiling and optimizing.

I am constructing a LBVH for all the lightsources every frame from scratch.

I'm doing this like described here: http://devblogs.nvidia.com/parallelforall/thinking-parallel-part-iii-tree-construction-gpu/

In short:

(1) compute a conservative AABB for all lights

(2) calculate Morton codes for each light

(3) sort the lights along their corresponding Morton codes

(4) construct a LBVH using a sparse tree representation

 

This works pretty well and uses all together 1 - 1.5 ms for up to 100 000 lights.

On the other hand the traversal of the LBVH consumes lots of time, primarily because I have to do this 2 times. 1st time to calculate the number of lights in each cluster, and the 2nd time after I have partitioned my indices texture to put the actual light indices into.

 

I have several different implementations for the traversal...

traversal with stack: http://pastebin.com/GZnzGrPw

stackless traversal: http://pastebin.com/4NE5UqVG

 

there are more variants to the stackless traversal (with less texture memory access) but I think they are not relevant for the time being.

 

My question is now, why is the stackless traversal faster (2 times as fast!) than the one with stack, even if there are AL LOT more texture memory reads. I figured the order in which the nodes are traversed is the same.

My theory goes as follows:

GPUs utilize fast context switches to gain performance. The stack (the 32 field array) uses up a lots of registers and the context switch is actually pretty slow. Although this is pure guessing an I have no proof what so ever. 

 

I tried to squeeze in as much info without bloating the post, so thanks to all those who have read up to this line biggrin.png

I'm very interested in your explanations as well.

 

Edit: I should have told you how the tree is represented in memmory...

It's a sparse tree representation with a texture holding the 2 children for the node N at the position N

The same with the parents, the parent of Node N is found by accessing the texture at position N


Edited by Wh0p, 03 August 2014 - 05:41 AM.


Sponsor:



Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS